July 2009 Status Update
Thursday, July 2nd, 2009 by Megan ForbesThe CollectionSpace July 2009 status report has been posted on the project wiki.
The CollectionSpace July 2009 status report has been posted on the project wiki.
The CollectionSpace June 2009 status report has been posted on the project wiki.
The CollectionSpace May 2009 status report has been posted on the project wiki.
The CollectionSpace April 2009 status report has been posted on the project wiki.
The CollectionSpace March 2009 status report has been posted on the project wiki.
The CollectionSpace February 2009 status report has been posted on the project wiki.
The topic of Big Data and the associated trends for research are part of our future here at DS. The recent issue of Nature looks at issues and trends around the topic, and while uneven, has some good material in it that folks should check out. Here’s my blow by blow on the sections:
The opening editorial calls for push to make annotating data be a major component of research and of grants. Sound familiar? Let’s hope funders listen.
The section on the next Google trots out a lot of familiar and frankly pretty dull options. Skip it.
Big data: Data wrangling poses important question about data collection. We might have the sense is that there is so much data, it is just a matter of managing it. However, David Goldston notes that there are also huge holes in the dataverse, and these are a result of political policy. Further, if a political entity controls the data, politics can (and will) shape and filter the data in fair-reaching ways.
Cory Doctorow’s Gee whiz piece is irritating (unless you’re into technoporn), and is easy to skip.
A piece on wikiomics is an excellent description of how community can make a difference, and the social dynamics of a collaboratory.
Cliff Lynch has a good piece on what data production projects must do to rationalize their data management, and what services must be provided by groups like IST/DS, to support these projects.
Frankel & Reid present an interesting discussion of mining and visualization, and include a compelling, cautionary note:
“The ingrained habits of highly trained scientists make them rarely as adventurous as these young minds. We think we are on the path to insight when shading reveals contours in 3D renderings, or when bursts of red appear on heat maps, for example. But the algorithms used to produce the graphics may create illusions or embed assumptions. The human visual system creates in the brain an apparent understanding of what a picture represents, not necessarily a picture of the underlying science. Unless we know all the steps from hypothesis to understanding — by conversing with theorists, experimentalists, instrument and software developers, visualization scientists, graphic artists and cognitive psychologists — we cannot be sure whether a display is accurate or misleading.”
The closing essay is human interest and could be skipped in the interest of time. However, it is short, and like the best human interest stories, is surprising and inspiring.
U. of Illinois is getting a chunk of NDIPP money to develop their BECHAMEL framework that identifies semantic vulnerabilities in metadata, as a means of supporting digital preservation services. What does this mean? Here’s a good quote:
“For example, the meta-data for a digital file—a photo or map or document—might include a field called “creator.” Putting a name like “John Smith” in this field might seem sufficient, but does that really identify the creator of the information? In 50 years will a future researcher be able to pinpoint which of the world’s many “John Smiths” created the information?
BECHAMEL flags risks like that one, or such as numerical values that aren’t accompanied by error ranges.”
There’s only a little more info in the article, but there are some papers on a research page at the uiuc site. David Dubin’s recent paper provides some better details. He describes their earlier BECHAMEL work as “a research environment for proposing and testing theories of the meaning of markup.” It is a Prolog app connected to an RDF store (Kowari, losing favor to Mulgara).
It sounds like some of what they’re doing is to recognize that lots of so-called structured markup (including, im my opinion, lots of RDF) is actually semantic-free and amounts to free text annotations with some weak hints (e.g., “dc:creator”). The question is whether the project will yield useful tools or more guidelines that are unrealistic in deployment. Their near term goal seems to be the conversion of entity references in free text (e.g., in a dc:creator element) to RDF references to vocabularies. Is a reference to the concept of “San Francisco, CA” in a gazetteer more useful than the same free text? Probably. But will an RDF pointer to a FOAF description of “John Smith”be much more useful than the free text? I doubt it. Nevertheless, a project worth watching.
On August 1, Project Bamboo launched a wiki-centric community design effort open to all comers. The structured activities laid out by Bamboo’s program staff aim to organize thousands of ideas and suggestions offered at four similarly-organized workshops in the months following the inception of Bamboo’s planning phase. Community design participants will synthesize the “Workshop One” artifacts (all available for review on the wiki) into a thematic overview of Arts and Humanities scholarship and a set of proposed directions for the Mellon-funded cyberinfrastructure project. Initial suggestions regarding consortial models appropriate to a Project Bamboo implementation phase are also being solicited on the project’s Planning Wiki, http://wikibamboo.uchicago.edu/display/BPUB/Home. The wiki, built on Atlassian’s Confluence platform, is viewable to the world, and anyone who wishes can create an account in order to join the community design effort. Participation in the wiki-focused work is one of four prerequisites for participation in the next face-to-face Project Bamboo workshop, currently scheduled for the week of October 13 on the west coast of the United States (more info at http://projectbamboo.org/join-us).
Nina Simon (who writes the Museum 2.0 blog) recently wrote about her impressions of the IMLS Meeting on Museums and Libraries in the 21st Century that took place last week. The meeting was preliminary to a large report that NAS is commissioning on the subject. It is an interesting survey of the state of attitudes in the industry, from the perspective of someone who wants to see things move forward.
She includes notes on the six topics that the workshop discussed:
Her general observations:
Read the post – it is interesting, and a good introduction to that blog, if you do not know it already.