The purpose of this paper is to examine challenges in acquiring US and international greenhouse gas GIS and remotely sensed data for research and active learning activities focused on climate science. Acquisition and use of NASA data sources for mapping atmospheric concentrations of greenhouse gases and the extent of their data curation, discovery systems, and formal publication using persistent identifier are evaluated. A sample of data sets are examined to determine bottlenecks in reuse of these data sets as well as potential instances of inappropriate use. The parameters of scale, resolution, atmospheric column type, lack of cartographic knowledge, and selection of covarying atmospheric variables are found to be areas for potential bottlenecks for novice and moderately experienced users. These issues represent a significant challenge in an escience workflow, particularly in reuse and data provenance lineages. However, they can be mitigated by effective data information literacy trainings and support materials.
In 2014, a team of librarians at Brown University began a concerted effort to ingest, describe, and publish scientific data and digital scholarship into the Brown Library’s data repository, the Brown Digital Repository (BDR). The Library targeted outreach towards student, staff, and faculty researchers in the sciences to encourage them to deposit their digital scholarship, such as digital research products related to grants and data related to their publications, into the BDR. This poster presents a snapshot of the types of scholarship that were deposited by scientists during a 2-year period and classifies the nature of these digital objects. The authors looked at the total number of files deposited by scientists over this period and created a tool to classify and categorize these objects in order to characterize the nature of digital scholarship that scientists were depositing. The instrument classified these objects into several categories and subcategories based on concrete criteria. The first category described digital objects associated with a publication. Data in this category were further classified into the subcategories “underlying data” and “supplementary data”. Underlying data included files that contained the results reported in the publication, files necessary for the peer review of the paper’s reported results and/or necessary for replication or reproduction of research results, such as code that was used to analyze results. The supplementary data were files accompanying a publication, including tables, graphs or visualizations that were not able to be included in the paper or were referenced by authors. The second category was files created by student, staff or faculty researchers not related to a publication but could stand alone as scholarly products equivalent to a publication, such as research posters, animations, visualizations, or software. The last category described digital collections, and included three subcategories: legacy data, digital libraries, and grants. Legacy data were digital products published by retiring faculty or faculty nearing the end of their research careers. Digital libraries included the published collections of scientific data not associated with a single publication. These collections could be published by individual researchers, a collaborative team, labs, and/or departments, and their purpose is to make these items available for other researchers to access and reuse. Lastly, the subcategory grant data contained collections of scientific data and/or other types of digital scholarship associated with a funded-project. These collections could be published by individual researchers, a collaborative team, labs, and/or departments, and the purpose is to disseminate items resulting from sponsored research and/or make these resulting grant-funded digital objects available for other researchers and/or the public.
Description: The MBLWHOI Library is partner in the NSF-funded EarthCube building block, GeoLink, whose goal is to push the boundaries of semantic technology in cross-repository discovery within the geosciences field. GeoLink has created ontology design patterns that link together: the MBLWHOI Library’s Woods Hole Open Access Server (WHOAS); data repositories, including Rolling Deck to Repository (R2R), Biological and Chemical Oceanography Data Management Office (BCO-DMO), Integrated Earth Data Applications (IEDA), Long-Term Ecological Research Network (LTER), DataONE and the International Ocean Discovery Program (IODP); the National Science Foundation (NSF) funded awards; and American Geophysical Union (AGU) conference presentations.
Methods: The Library created WHOAS, a DSpace repository, more than 10 years ago. As part of the GeoLink project we have been collaborating with @mire, a leader in open source DSpace development to develop modular code that can be integrated into other DSpace installations to improve LOD functionality.
Expected Outcomes: Our DSpace goals are to provide a built-in SPARQL endpoint for easy access to the data, implement editable authority concepts with URIs, and be able to integrate concepts from other authoritative sources using SPARQL queries. There is a second layer that takes dc-based DSpace triples and constructs triples based on GeoLink patterns which describes the data in a geoscience context. This provides flexibility and serves as a model for organizations contributing data to multiple LOD communities. This code is freely available to the 1700 registered DSpace repositories worldwide.
Evaluation Method: We will determine the effectiveness with which we can contribute to the GeoLink Knowledgebase.
View full text of posters presented at the 2016 University of Massachusetts and New England Area Librarian e-Science Symposium
Curating Research Data in DRUM: A workflow and distributed staffing model for institutional data repositories
Objective: The University of Minnesota Libraries launched a data repository in March 2015 called the Data Repository for the University of Minnesota (DRUM). Based on past pilots of data curation services it was clear that we required a distributed data curation approach to handling the wide variety of data sets that our large diverse academic community will produce.
Methods: Our procedure and staffing model are currently operational. We review all incoming data sets for their subject area and data type; we assign the new curation task to one of 6 appropriate data curators based in five subjects: scientific, social sciences, GIS/spatial, digital humanities, and health sciences. Curators review the data for usability and quality issues and work directly with the data authors to enrich the submission. Curation approaches include generating custom metadata, arrangement and description of the objects, and file transformation for preservation needs. Metrics such as time taken to curate the data and interactions with data authors are tracked.
Results: The data curation procedures have been tested and refined based on one year of implementation. This poster will visualize the current approach taken, the skills of needed curator staff, and share summary statistics of the data curated to date.
Conclusions: An institutional data repository has the burden of collecting a diverse array of digital data, but, with appropriate staffing and careful procedures, each dataset regardless of its' distinctiveness can be enriched for dissemination and reuse.
Objective: To experience the process of using principles of scientific research data management (SRDM) to work with a researcher to create a data management plan (DMP). SRDM is an area where research in the traditional sciences intersects with information science. SRDM guides researchers through all stages of the data life cycle. A DMP is a document explaining how a study will progress through the data life cycle that is increasingly required by research funders. This project was undertaken as part of a class on SRDM through the Simmons College School of Library and Information Science.
Methods: After corresponding via email with a researcher studying the cognitive and linguistic skills of deaf children with autism, a set of questions was created based on an interview instrument developed by the Digital Curation Centre and a Skype interview was conducted. Using the information gathered during the interview and in follow-up emails, as well as knowledge of SRDM principles learned in class and through independent research, a DMP (following National Science Foundation guidelines) was created. Additionally, aspects of the researcher’s study which proved challenging when creating a DMP were identified.
Results: A seven-part DMP was created. Challenging aspects were identified as a set of teaching points. These included: data being collected via video camera; children as subjects; subject IDs; repository requirements.
Conclusions: This project was successful in teaching both this author and the interviewed researcher about SRDM and DMPs. This will improve the cognitive science community’s understanding of the principles and importance of SRDM.
This poster presents the results of a case study completed as part of the Simmons College course Scientific Research Data Management. The case study focuses on research conducted by a computer science laboratory on factors that influence patient mortality, using topic modeling and other computational techniques. Based on the research narrative, data management practices and needs specific to computer science research are mapped to the New England Collaborative Data Management Curriculum Modules for Managing Research Data. These areas of focus will help information professionals to identify the data management challenges presented by computer science research, as well as the tools and techniques recommended to create an effective data management plan. Overall, this case study demonstrates the importance of valuing and managing source code as data, in order to ensure reproducibility of results and open access to data.
Objective: By using the data practices of a particular astronomy research team using the radial velocity method to discover exoplanets as a guiding case example, this poster demonstrates data management practices for multiteam, research collaborations in the field of astronomy.
Methods: This project began with a 60 minute interview with the local primary investigator on a research team using the radial velocity method to discover exoplanets. This interview was transcribed and used to determine the data management practices currently in place. Using a transcription of the interview to determine the data management practices currently in place, areas for improvement were found based on class discussion, lectures, and readings. The New England Collaborative Data Management Curriculum’s Simplified Data Management Plan and the “DCC Curation Lifecycle Model” proved particularly useful in this process.
Results: Using the aforementioned Simplified Data Management Plan as a guide, areas in need of improvement were determined and highlighted as were areas of success. This poster serves to display these results.
Conclusion: In many ways, the astronomy field is exceptional in terms of data and metadata management; however, challenges still arise when dealing with newer technology. Best practices for management and preservation of programmed algorithms, such as the Python pipeline, continue to develop. The perception of infinite digital storage capacity can lead to poor data curation practices. Overall, the specificity of the astronomy discipline benefits from well established domain-based practices.
Objective: Librarians are not just using open research tools, they are contributing to, even leading, initiatives that develop these tools. SHARE is one such initiative and is creating a new access and discovery tool which addresses the need to maximize research impact. Most access and discovery tools stifle innovation by keeping information about research behind paywalls or in environments that discourage reuse. SHARE is developing SHARE Notify, an open access and discovery tool that is free and encourages use, reuse, and repurposing. SHARE Notify is a dataset of metadata about research events such as articles, datasets, presentations, grant awards, etc. This poster addresses the purpose of SHARE and the development SHARE Notify.
Methods: SHARE is funded by Sloan and IMLS; led by ARL and COS; and co-sponsored by the AAU and APLU. SHARE Notify is being developed collaboratively by participants representing libraries, repositories, university administrations, publishers, and non-profit organizations.
Results: In 2015, SHARE released a beta version of SHARE Notify. SHARE Notify harvests metadata from more than 100 content providers including data, institutional, and disciplinary repositories and databases such as CrossRef and PubMed Central. SHARE Notify’s code is freely available on the Open Science Framework. Anyone is free to participate in and build upon SHARE Notify.
During Phase Two (2015-2017) SHARE is enhancing the SHARE Notify dataset by harvesting from more sources, adding more identifiers, working with similar international initiatives on interoperability, and promoting SHARE.
Conclusions: SHARE welcomes your involvement in ensuring that SHARE Notify reaches its full potential.
As undergraduate students, graduate students, and professionals in science continue to conduct research and produce incredible amounts of data by doing so, it has become more and more apparent that data management planning and implementation are of the utmost importance to the stewardship of scientific research data now and into the future. The case study approach to studying how data management occurs to any degree at a given institution is very useful for teaching librarians how to consult on data management and promote e-science in research institutions.
The case study discussed here focuses on the research taking place in a ‘question-based’ ecology and evolutionary biology laboratory at a flagship research university in New England. This research is being conducted on live insect specimens and their flesh samples and utilizes various types of data collecting and producing instruments. The data products contribute to ongoing international research on cicadas. A qualitative interview was conducted to understand the lab’s practices and to write a narrative of the lab’s research story. This narrative was then used to write a data management plan (DMP) that addresses each of the seven standard modules of a DMP. The narrative and subsequent data management plan are beneficial teaching tools for consulting research lab teams on not only how to manage their data, but how to act as the stewards of it.
Objective: This case study aims to identify data management needs in archaeological research by examining one project’s current practices.
Context: Archaeologists working at academic institutions in the United States frequently conduct excavations in foreign countries. Primary Investigators are required to comply with permit requirements and laws of the host country, which may pertain to the data collected or published. There are also logistical challenges in obtaining, storing, and sharing data among international collaborators. Tel Kabri was a Middle Bronze Age palace near the Mediterranean coast. Excavations started in the 1980s, and apply a range of technologies and methods to gain a holistic understanding of daily life and trade at Kabri.
Methods: An interview instrument, based on the Digital Curation Centre’s Checklist for a Data Management Plan 4.0, was developed and used in an interview with lead staff to focus on understanding the project’s data workflow throughout the data lifecycle.
Results: Recommendations for a Data Management Plan were made: data will be imported to software that can manage multiple file types, assign metadata, and provide versioning control; all data will be duplicated and stored in a U.S.-based repository or cloud-based storage service; re-use is subject to approval of the PIs and may be requested by contacting the PIs or the Israel Antiquities Authority; data in paper notebooks will be digitized; data will be stored in open-source formats where possible; Israel Antiquities Authority will be responsible for storing, archiving, and preserving all materials.
Conclusions: Archaeology as a discipline is centered on the importance of context and data preservation. Partnering with archaeologists may allow LIS professionals to pursue a model for global data services that addresses the complexities of collecting data in foreign countries, incorporating legacy data, and preserving multiple data types.
Objective: As text/data mining (TDM) becomes more prevalent, researchers seek to mine library resources for their projects. Some vendors are including language in their TDM licenses that aims to protect their investments by limiting dissemination and/or retention of TDM data. At the same time, researchers are increasingly being called upon by funding agencies to share and retain data from their projects. This work investigated whether vendor restrictions on TDM data sets from research projects might conflict with funder policies on data sharing and retention.
Methods: Language from existing TDM licenses was compared with guidance from several grant-funding agencies to identify potential conflicts with sharing or retaining data generated in the course of TDM research projects.
Results: Potential incompatibilities between TDM licensing language and funding agency data policies were identified. Vendor limitations on the length of TDM output could conflict with data sharing policies. Data retention is an area of particular concern, as in some cases, funder policies on data retention periods are at odds with TDM licensing terms that require data to be destroyed upon conclusion of the work.
Conclusions: In some cases, language in library vendor TDM licenses is at odds with funding agency policies on data sharing and retention. As support for TDM research continues to evolve, librarians who assist researchers with data management plans should be aware of potential conflicts between vendor TDM licenses and funder data policies on data sharing and preservation.
Objective: One of the two major observed gaps in data management plan services at an aspirant Research University is the selection and implementation of a metadata schema. Previous instructional sessions with metadata focused on theory, but may not have reached learners effectively. To address these needs, three librarians have come together to plan a hands-on course to instruct researchers about the fundamentals of metadata.
Methods: The instruction for faculty, staff, and graduate students will be taught in a single 90-minute workshop. Learners will be given an introduction to metadata before experimenting with a sample data set using OpenRefine and Dublin Core. Following this introduction, attendees will be given the opportunity to use their own dataset to experiment with Dublin Core conventions.
Results (Expected): The course, expected to be taught in mid-April, is expected to provide learners with an easy-to-understand method of implementing metadata for their research projects. In-class hands-on practice should give attendees enough confidence to use Dublin Core or other metadata schema in their work.
Conclusions (Expected): We expect that the course will have a higher attendance than previous sections of our Data Management Workshop Series, and we expect that a wider variety of attendees will be present. Without the pressure to commit to a full series, and the promotion of an active learning strategy, we would expect attendees to feel more confident that they have learned and will be able to implement valuable information.
Purpose: This poster describes the efforts to integrate RDM (research data management) tools, such as electronic lab notebooks and the DMPTool, into RDM instruction for students at the University of Massachusetts Medical School.
Setting: Flexible Clinical Experiences (FCEs) are short (one-week), student-driven or pre-designed for-credit courses available to third-year medical students at the University of Massachusetts Medical School. FCEs allow students to explore an area of clinical interest, to be exposed to medical specialties, or to pursue further learning in a specific field. The Lamar Soutter Library designed and offered an FCE on data management principles and best practices in FY 2015-2016. Initially designed as a lecture-only course following the New England Collaborative Data Management Curriculum, the FCE was tailored to include both an electronic laboratory notebook system and a data management plan generator.
Description: FCE 3017: Research Data Management Fundamentals is a course intended to introduce medical students to the basic principles and best practices for research data management early in their careers. Participants in the course utilized the LabArchives electronic laboratory notebook system and the DMPTool data management plan generator to complete their coursework. All course materials, including a “Gummi Bear Anatomy Study,” were made available to students through the lab notebook tool. Students collected, documented, and shared data using the same system. The goal of using existing tools was to familiarize students with the practical application of RDM principles in the context of a research project, and to identify resources available to them.
Outcome: Participants in FCE3017: Research Data Management Fundamentals were able to successfully apply the basic principles of research data management in the context of a research project, while also utilizing and becoming familiar with available resources.
 Vasilevsky N, Wirz J, Champieux R, Hannon T, Laraway B, Banerjee K, Shaffer C, and Haendel M. “Lions, Tigers, and Gummi Bears: Springing Towards Effective Engagement with Research Data management (2014). Scholar Archive. Paper 3571.
Context: Yale University has a long tradition of data services in the social sciences. To keep up with funding agency requirements, disciplinary community standards, and researcher needs, Yale expanded its research data service offerings to the sciences and medicine.
Background: In 2012, Yale University Library and ITS participated jointly in the eScience Institute offered by the DLF, ARL, and CLIR. This poster summarizes the strategic agenda from that project as well as the formation of our Research Data Consultation Group (RDCG).
Current Services: The Research Data Consultation Group is a collaborative, university-wide group charged with responding to service requests and inquiries from researchers at any stage in the data lifecycle in order to work together on best practices, implement data management services, and help link users to resources. We consult on data management planning; finding and using data; data collection, analysis, and processing; and distributing, sharing, and archiving data.
Aim: The aim of RDCG is to meet the needs of researchers who wish to follow and inform disciplinary best practices, meet funder, university, or contract requirements, and also to streamline consultative workflows across several organizations and departments at Yale.
Future Directions: Future projects for RDCG include expanded education and training offerings for library, ITS, teaching and learning support staff, and departmental staff, in addition to Yale students, faculty, postdocs, and other researchers. Simultaneously, we will continue our efforts to clarify and support university and other requirements for data management, sharing, and preservation.
Purpose: This poster examines the collaboration between an academic library, various university departments, and an open source data repository to help faculty and affiliated researchers curate, share and archive research data.
Brief Description: The Harvard Dataverse (https://dataverse.harvard.edu)--powered by the Dataverse Project, open source data publishing software, developed at Harvard’s Institute for Quantitative Social Science (IQSS) for nearly a decade--has recently been collaborating with Harvard Library, Harvard Medical School, Harvard-Smithsonian Center for Astrophysics (CfA), and other groups from the university to provide a data repository solution for sharing, publishing and archiving research data for Harvard faculty and affiliated researchers. This collaboration has expanded the scope of the Dataverse Project, data repository open source software, to better support research data beyond just the social sciences. The Harvard Dataverse team has also extended its services to provide user support, training, and targeted data curation services to the Harvard community.
Results/Outcome: Current and upcoming collaborative projects include: connecting faculty publications with their underlying research data by integrating Dataverse with Harvard’s institutional repository Digital Access to Scholarship at Harvard (DASH); extending metadata support for astronomy with the Center for Astrophysics (CfA) and biomedical datasets with the Harvard Medical School (HMS); providing university-wide open data awareness and support via the Harvard Open Data Assistance Program (ODAP); make licensed datasets available to the Harvard community (Harvard Subscription Data Dataverse); helping researchers meet the requirements of funder mandated data management plans through customized DMPTool services; and making faculty datasets more widely discoverable by exporting metadata (MARC) into the Harvard Library Catalog, HOLLIS.
Evaluation Method: Site metrics to measure if there is an increase in usage, which includes number of new datasets, and dataset downloads and views.
Team Teaching, Humor, and Informal Polling Techniques in NECDMC - Based Research Data Management Workshops at Brandeis University
Objective: This poster will detail a few pedagogical techniques incorporated into a series of workshops on research data management. These techniques were chosen to better engage workshop participants by making the material more individually relevant and relatable.
Methods: For the second semester in which workshops on research data management (derived from the NECDMC modules) were offered through the Brandeis University Library & Technology Services department, the librarian sought to make the sessions more engaging through several pedagogical techniques. A team teaching approach was employed by inviting senior members of the Technology Help Desk and Hardware Repair Shop to collaborate on and co-teach the workshops. Humor was employed strategically through icons, disaster stories, and select xkcd webcomic strips within the lecture and slides. Informal polling and direct encouragement to share personal anecdotes during the workshop sessions promoted active engagement.
Results: Workshop participants were visibly more engaged, asked a greater number of questions, and questions were more directly relevant to the presented material than participants of the first semester's workshops, before these pedagogical techniques were employed.
Conclusions: Active and engaged learning techniques are difficult to employ in what are essentially one-shot hour-long instruction sessions, particularly when participants are largely unfamiliar with the material at hand. But by integrating some techniques to capture the interest and encourage participants to make connections to their own experiences we observed a deeper understanding and appreciation of the material.
Objective: Building awareness of new library services on any campus can be difficult, especially when these services are deemed “non-traditional.” To help us overcome this challenge, the Boston University Libraries’ partnered with the Mozilla Science Lab to launch a new initiative called “Study Group” on campus. This poster describes Mozilla’s community building philosophy, the initial results of this partnership, and the technologies we have used.
Methods: The Mozilla Science Lab launched its Study Group initiative during the spring of 2015 to help researchers practice open science through community-led, technology-driven workshops. Group members lead each workshop in an informal, approachable way that encourages members to be both teachers and learners. This approach has created a venue for librarians to engage with graduate students as peers and has opened new two-way communication channels. Additionally, by using open technologies advocated by Mozilla, like GitHub and Gitter, the library now engages researchers on the platforms they already prefer.
Results: Our first two events this spring had ten and twelve participants respectively and we have another ten events scheduled. Last fall we held six events with a total of fifty-five participants. This spring we have held nine events with a total of forty-four participants. Of note, library staff have led only two sessions. This is an important achievement because it has limited our investment in staff time while still allowing us to achieve beneficial outreach results. Less tangibly, participants are beginning to view the library as a partner in open research, as a resource for data sharing, and as a more technology-driven organization. Finally, outside of staff time our total investment (including launching bu.edu/study) has been $240 – predominantly for posters and other outreach materials.
Conclusions: Partnering with the Mozilla Science Lab has helped the library engage with graduate researchers in the sciences through community building in a peer-to-peer format. This relationship has helped both the BU Libraries and Mozilla achieve their respective goals to engage researchers in open research practices in a mutually beneficial way.
Objectives: The Research Data Management Roundtables are a collaborative effort to address the shared challenges relating to data services and the role of the library. Outside of listservs, conferences, or other multi-day events, there are few means to discuss issues related to beginning or sustaining data management initiatives among organizations, libraries, and librarians. The Roundtables are one method to address this issue.
Methods: The New England eScience Program (NN/LM NER) sponsored two Roundtable discussions in 2015. Each was a one day event, and included an activity based in the local area and the Roundtable discussion. Around twenty librarians from multiple New England institutions met at central locations in Amherst and Worcester, Massachusetts. The committee members facilitated the discussions with guidelines and pre-determined topics.
Results: “Outcomes” from the two discussions were posted to the e-Science Portal for New England Librarians Community Blog. The first Roundtable focused on “organizational structures for research data management services at our institutions,” and the second discussed “engaging faculty and graduate student researchers at our institutions.” Feedback on the most useful aspects of the Roundtables included: “questions were very helpful to guide the discussion”; “hearing other librarians’ ideas and successes, and commiserating about weaknesses and challenges”; and “learning I’m not the only one with questions.”
Conclusions: The Roundtables have been successful professional development for librarians to share experiences and training with other librarians, in order to broaden their knowledge about RDS. Suggestions for the future include discussions on: data policies, policy development, and examples; how to gain traction with administration; and collaborating across campus.
Objective: New York University (NYU) Libraries provide research data services to diverse communities across several campuses. Until recently, they have worked mostly independent of each other. At the main campus, NYU Data Services offers workshops, individual and group consultations, and traveling “road shows” on data management to the larger NYU community. At a separate medical center campus, the NYU Health Sciences Library (NYUHSL) supports a data catalog, data management education, and individualized lab support. Finally, Databrary, which is connected to NYU’s Digital Library Technology Services, provides a repository for behavioral and learning science researchers working primarily with video data to store, manage, and share the raw materials of their work with their colleagues. This poster will discuss how these disparate services have worked more closely together by identifying overlap, making connections between service offerings, and sharing knowledge and resources around data. This initiative better enriches the overall mission and strategy of NYU libraries to serve its student and research communities.
Methods: To ensure the better coordination of these data services, we began to hold regular, bi-monthly meetings to discuss strategies for improving data education material, integrating an institutional data catalog created by NYUHSL with main campus systems, and providing data-related outreach to institutional stakeholders. These groups have also collaborated on planning and hosting events on data-related topics including using Databrary, reproducibility in science, and data visualization. Finally, a resource sharing system was instituted across campuses for library faculty to collaborate and improve upon the instructional design of data management education, create outreach materials, and share ongoing project documentation.
Results: The new collaboration between NYU Data Services, NYUHSL and special projects like Databrary has served to break down existing institutional silos to provide better research and educational data services to NYU’s student and research communities. This collaboration has been essential for improving upon existing services, identifying new opportunities to support the data needs of institutional stakeholders, and providing increased levels of outreach. By fostering a better understanding of what data services are available across campuses through this ongoing collaboration, we are better able to identify and support our communities’ data needs.
Conclusion: Providing data management, curation, and storage services for a diverse and dynamic research community on campus is a demanding task that requires a distributed effort. Each service fills different gaps for researchers at varying stages of their research practices, though without inter-department communication there was decidedly less impact and reach by everyone. By collaborating and opening a line of communication, we have built a better understanding of how we can interact to provide stronger support to the student and research communities across campuses.