Cliff Lynch: Humanities workflow as a sensor network
February 4th, 2008 by Patrick SchmitzI sat in on Cliff’s recent Friday seminar, where he presented a few intriguing ideas (as he is wont to do). In particular, he was exploring the idea of humanities workflows as related to sensor network management. He (and we) have been thinking about what it means to describe a workflow for the humanities and is comparing it to the kind of systems used by geo-science and oceonography. A recent model (which may or may not scale, but that is beside the point) that is in favor in these sciences specifies a low level of sensing until something “interesting” happens and then an increased rate of observations (to capture lots of interesting details). Cliff proposed that humanities research might be seen to follow a similar model of scanning various sources for potential utility; when something related or interesting is found, the academic dives in and looks deeper and more carefully.
This fired a tangent in my thinking: perhaps it is also related to materials processing (e.g., in manufacturing). A given factory has various sources of inputs and must evaluate these to maximize their own output-quality at a reasonable cost. Seems to me that much of research (of most sorts, but especially for information processing such as in the humanities) sounds rather like this: evaluating quality of input from sources, considering the cost (usually time but may be effort), switching sources from time to time (e.g., when one finds a new journal or research group with promising content) . All this with an eye to maximizing the output quality (an academic’s own research).
So what? Perhaps there are lessons learned from the modeling (optimization, etc.) that have gone into these respective disciplines. This assumes that in aggregate people act somewhat like enterprises, the evaluation of which is left as an exercise for the reader.
Cliff also talked about documenting workflow as digital provenance, and the difference between workflow languages that seek to abstract the work (so that the workflow can be shared and reused) and documentation systems that serve to capture experimental or processing details (including data sources, software versions, etc.). We discussed the coming need to understand what constitutes a significant alteration of a processing flow (e.g., does a minor rev of software in a lab workbench change the experiment in a substantive way?). Appears to be a promising area of research.
Cliff also mentioned the MyExperiment project which lets contributors post scientific workflows and share them with a community. Interesting idea, and underscores the importance of formalized workflow in the scientific disciplines (especially those using the lab workbench tools). Looks to me like yet another discipline is beginning to look more like software (following the path of hardware design using CAD systems with elaborate libraries that are linked, not to mention FPGA devices).
Cliff mentioned the issues of repositories and versioning, noting that archives tend to want an object only when it is “dead” (no longer changing). He mentioned the tension between saving a few versions of interest, and the cost of preserving an auto-save version generated every few minutes. I suggested that having even a nominal charge will take care of much of this, as people will then balance cost and benefit to moderate submissions. A related issue came up about authorial control over the submissions: one the one hand it would be nice to automate dissemination of materials (e.g. to journals, peer review mechanisms, etc.), but one the other hand an author may want to control this so that peers do not see “premature” work. This reminds of similar challenges faced by software developers who want to check in interim (or branch) versions that are not yet ready for integration into the main trunk of development.
I guess when you have a hammer, lots of things start to look like nails…
Tags: Cliff Lynch, digital provenance, Friday Seminar, repositories, workflow
