Climate Indicators and Data Provenance

How does a scientist know whether an available data set can support their work? This study examines how researchers from different disciplines and practical contexts (e.g., graduate students, faculty researchers, federal research scientists) use information about the sources and analysis of data, also known as provenance, when presented with indicators in an online system, addressing the research question: Can coupling climate-related indicators with data provenance support scientific innovation and science translation? This study draws on web credibility research as well as boundary object theory, which focuses on the role of artifacts (such as images) in translation and communication across the boundaries of social groups, as a theoretical lens to inform and direct our inquiry. In this pilot study, we are examining the way that such artifacts can support innovation and translation in the National Climate Indicator System (NCIS). Through a multi-stage research design, we hope to discover principles for optimizing the presentation of data for scientific advancements and translation. Packaging an integrated data product (indicators) with its provenance seems a valuable strategy for improving the ability of researchers to creatively consider the utility of data and information from other domains for use in their own work. This research is a collaboration with Dr. Melissa Kenney of the Earth System Science Interdisciplinary Center, supported by a seed grant from the UMD ADVANCE Program for Inclusive Excellence (NSF award HRD...

ADVANCE seed grant awarded

It’s official: the Open Knowledge Lab’s latest new project, a study of how researchers assess data, has been funded under the UMD ADVANCE seed grant program! Lab Director Wiggins will work with Dr. Melissa Kenney and her team on a study of climate indicators—data visualizations with brief text descriptions and links to provenance describing the sources of data and analysis processes—and how scientists assess the data when these pieces of content are delivered in different ways. Right now, there’s a big push for scientific data to be shared and re-used, but sharing these data effectively is harder than it sounds. First, there’s a lot of “extra work” involved, and the payoff to the sharer isn’t always obvious or direct. Second, without that extra work (or in spite of it), using data collected by someone else is often simply harder from an analytical standpoint, even if it does save you a whole lot of time and money on collecting the data. There are a lot of reasons that it’s challenging to re-use scientific data, but right from the start, you have to figure out if the data set in front of you will be useful. This is an especially challenging task and still a fairly big problem in the area of data discovery, so we hope the results of this study can help reduce this critical bottleneck to effective data discovery and use. At the end of the day, if representing data sets in a particular way helps convey their value to potential data consumers more effectively, then it would clearly be worth the relatively small added effort required to...