Objective: By using the data practices of a particular astronomy research team using the radial velocity method to discover exoplanets as a guiding case example, this poster demonstrates data management practices for multiteam, research collaborations in the field of astronomy.
Methods: This project began with a 60 minute interview with the local primary investigator on a research team using the radial velocity method to discover exoplanets. This interview was transcribed and used to determine the data management practices currently in place. Using a transcription of the interview to determine the data management practices currently in place, areas for improvement were found based on class discussion, lectures, and readings. The New England Collaborative Data Management Curriculum’s Simplified Data Management Plan and the “DCC Curation Lifecycle Model” proved particularly useful in this process.
Results: Using the aforementioned Simplified Data Management Plan as a guide, areas in need of improvement were determined and highlighted as were areas of success. This poster serves to display these results.
Conclusion: In many ways, the astronomy field is exceptional in terms of data and metadata management; however, challenges still arise when dealing with newer technology. Best practices for management and preservation of programmed algorithms, such as the Python pipeline, continue to develop. The perception of infinite digital storage capacity can lead to poor data curation practices. Overall, the specificity of the astronomy discipline benefits from well established domain-based practices.
Objective: Librarians are not just using open research tools, they are contributing to, even leading, initiatives that develop these tools. SHARE is one such initiative and is creating a new access and discovery tool which addresses the need to maximize research impact. Most access and discovery tools stifle innovation by keeping information about research behind paywalls or in environments that discourage reuse. SHARE is developing SHARE Notify, an open access and discovery tool that is free and encourages use, reuse, and repurposing. SHARE Notify is a dataset of metadata about research events such as articles, datasets, presentations, grant awards, etc. This poster addresses the purpose of SHARE and the development SHARE Notify.
Methods: SHARE is funded by Sloan and IMLS; led by ARL and COS; and co-sponsored by the AAU and APLU. SHARE Notify is being developed collaboratively by participants representing libraries, repositories, university administrations, publishers, and non-profit organizations.
Results: In 2015, SHARE released a beta version of SHARE Notify. SHARE Notify harvests metadata from more than 100 content providers including data, institutional, and disciplinary repositories and databases such as CrossRef and PubMed Central. SHARE Notify’s code is freely available on the Open Science Framework. Anyone is free to participate in and build upon SHARE Notify.
During Phase Two (2015-2017) SHARE is enhancing the SHARE Notify dataset by harvesting from more sources, adding more identifiers, working with similar international initiatives on interoperability, and promoting SHARE.
Conclusions: SHARE welcomes your involvement in ensuring that SHARE Notify reaches its full potential.
As undergraduate students, graduate students, and professionals in science continue to conduct research and produce incredible amounts of data by doing so, it has become more and more apparent that data management planning and implementation are of the utmost importance to the stewardship of scientific research data now and into the future. The case study approach to studying how data management occurs to any degree at a given institution is very useful for teaching librarians how to consult on data management and promote e-science in research institutions.
The case study discussed here focuses on the research taking place in a ‘question-based’ ecology and evolutionary biology laboratory at a flagship research university in New England. This research is being conducted on live insect specimens and their flesh samples and utilizes various types of data collecting and producing instruments. The data products contribute to ongoing international research on cicadas. A qualitative interview was conducted to understand the lab’s practices and to write a narrative of the lab’s research story. This narrative was then used to write a data management plan (DMP) that addresses each of the seven standard modules of a DMP. The narrative and subsequent data management plan are beneficial teaching tools for consulting research lab teams on not only how to manage their data, but how to act as the stewards of it.
Objective: This case study aims to identify data management needs in archaeological research by examining one project’s current practices.
Context: Archaeologists working at academic institutions in the United States frequently conduct excavations in foreign countries. Primary Investigators are required to comply with permit requirements and laws of the host country, which may pertain to the data collected or published. There are also logistical challenges in obtaining, storing, and sharing data among international collaborators. Tel Kabri was a Middle Bronze Age palace near the Mediterranean coast. Excavations started in the 1980s, and apply a range of technologies and methods to gain a holistic understanding of daily life and trade at Kabri.
Methods: An interview instrument, based on the Digital Curation Centre’s Checklist for a Data Management Plan 4.0, was developed and used in an interview with lead staff to focus on understanding the project’s data workflow throughout the data lifecycle.
Results: Recommendations for a Data Management Plan were made: data will be imported to software that can manage multiple file types, assign metadata, and provide versioning control; all data will be duplicated and stored in a U.S.-based repository or cloud-based storage service; re-use is subject to approval of the PIs and may be requested by contacting the PIs or the Israel Antiquities Authority; data in paper notebooks will be digitized; data will be stored in open-source formats where possible; Israel Antiquities Authority will be responsible for storing, archiving, and preserving all materials.
Conclusions: Archaeology as a discipline is centered on the importance of context and data preservation. Partnering with archaeologists may allow LIS professionals to pursue a model for global data services that addresses the complexities of collecting data in foreign countries, incorporating legacy data, and preserving multiple data types.
Objective: As text/data mining (TDM) becomes more prevalent, researchers seek to mine library resources for their projects. Some vendors are including language in their TDM licenses that aims to protect their investments by limiting dissemination and/or retention of TDM data. At the same time, researchers are increasingly being called upon by funding agencies to share and retain data from their projects. This work investigated whether vendor restrictions on TDM data sets from research projects might conflict with funder policies on data sharing and retention.
Methods: Language from existing TDM licenses was compared with guidance from several grant-funding agencies to identify potential conflicts with sharing or retaining data generated in the course of TDM research projects.
Results: Potential incompatibilities between TDM licensing language and funding agency data policies were identified. Vendor limitations on the length of TDM output could conflict with data sharing policies. Data retention is an area of particular concern, as in some cases, funder policies on data retention periods are at odds with TDM licensing terms that require data to be destroyed upon conclusion of the work.
Conclusions: In some cases, language in library vendor TDM licenses is at odds with funding agency policies on data sharing and retention. As support for TDM research continues to evolve, librarians who assist researchers with data management plans should be aware of potential conflicts between vendor TDM licenses and funder data policies on data sharing and preservation.
Objective: One of the two major observed gaps in data management plan services at an aspirant Research University is the selection and implementation of a metadata schema. Previous instructional sessions with metadata focused on theory, but may not have reached learners effectively. To address these needs, three librarians have come together to plan a hands-on course to instruct researchers about the fundamentals of metadata.
Methods: The instruction for faculty, staff, and graduate students will be taught in a single 90-minute workshop. Learners will be given an introduction to metadata before experimenting with a sample data set using OpenRefine and Dublin Core. Following this introduction, attendees will be given the opportunity to use their own dataset to experiment with Dublin Core conventions.
Results (Expected): The course, expected to be taught in mid-April, is expected to provide learners with an easy-to-understand method of implementing metadata for their research projects. In-class hands-on practice should give attendees enough confidence to use Dublin Core or other metadata schema in their work.
Conclusions (Expected): We expect that the course will have a higher attendance than previous sections of our Data Management Workshop Series, and we expect that a wider variety of attendees will be present. Without the pressure to commit to a full series, and the promotion of an active learning strategy, we would expect attendees to feel more confident that they have learned and will be able to implement valuable information.
Purpose: This poster describes the efforts to integrate RDM (research data management) tools, such as electronic lab notebooks and the DMPTool, into RDM instruction for students at the University of Massachusetts Medical School.
Setting: Flexible Clinical Experiences (FCEs) are short (one-week), student-driven or pre-designed for-credit courses available to third-year medical students at the University of Massachusetts Medical School. FCEs allow students to explore an area of clinical interest, to be exposed to medical specialties, or to pursue further learning in a specific field. The Lamar Soutter Library designed and offered an FCE on data management principles and best practices in FY 2015-2016. Initially designed as a lecture-only course following the New England Collaborative Data Management Curriculum, the FCE was tailored to include both an electronic laboratory notebook system and a data management plan generator.
Description: FCE 3017: Research Data Management Fundamentals is a course intended to introduce medical students to the basic principles and best practices for research data management early in their careers. Participants in the course utilized the LabArchives electronic laboratory notebook system and the DMPTool data management plan generator to complete their coursework. All course materials, including a “Gummi Bear Anatomy Study,” were made available to students through the lab notebook tool. Students collected, documented, and shared data using the same system. The goal of using existing tools was to familiarize students with the practical application of RDM principles in the context of a research project, and to identify resources available to them.
Outcome: Participants in FCE3017: Research Data Management Fundamentals were able to successfully apply the basic principles of research data management in the context of a research project, while also utilizing and becoming familiar with available resources.
 Vasilevsky N, Wirz J, Champieux R, Hannon T, Laraway B, Banerjee K, Shaffer C, and Haendel M. “Lions, Tigers, and Gummi Bears: Springing Towards Effective Engagement with Research Data management (2014). Scholar Archive. Paper 3571.
Context: Yale University has a long tradition of data services in the social sciences. To keep up with funding agency requirements, disciplinary community standards, and researcher needs, Yale expanded its research data service offerings to the sciences and medicine.
Background: In 2012, Yale University Library and ITS participated jointly in the eScience Institute offered by the DLF, ARL, and CLIR. This poster summarizes the strategic agenda from that project as well as the formation of our Research Data Consultation Group (RDCG).
Current Services: The Research Data Consultation Group is a collaborative, university-wide group charged with responding to service requests and inquiries from researchers at any stage in the data lifecycle in order to work together on best practices, implement data management services, and help link users to resources. We consult on data management planning; finding and using data; data collection, analysis, and processing; and distributing, sharing, and archiving data.
Aim: The aim of RDCG is to meet the needs of researchers who wish to follow and inform disciplinary best practices, meet funder, university, or contract requirements, and also to streamline consultative workflows across several organizations and departments at Yale.
Future Directions: Future projects for RDCG include expanded education and training offerings for library, ITS, teaching and learning support staff, and departmental staff, in addition to Yale students, faculty, postdocs, and other researchers. Simultaneously, we will continue our efforts to clarify and support university and other requirements for data management, sharing, and preservation.
Purpose: This poster examines the collaboration between an academic library, various university departments, and an open source data repository to help faculty and affiliated researchers curate, share and archive research data.
Brief Description: The Harvard Dataverse (https://dataverse.harvard.edu)--powered by the Dataverse Project, open source data publishing software, developed at Harvard’s Institute for Quantitative Social Science (IQSS) for nearly a decade--has recently been collaborating with Harvard Library, Harvard Medical School, Harvard-Smithsonian Center for Astrophysics (CfA), and other groups from the university to provide a data repository solution for sharing, publishing and archiving research data for Harvard faculty and affiliated researchers. This collaboration has expanded the scope of the Dataverse Project, data repository open source software, to better support research data beyond just the social sciences. The Harvard Dataverse team has also extended its services to provide user support, training, and targeted data curation services to the Harvard community.
Results/Outcome: Current and upcoming collaborative projects include: connecting faculty publications with their underlying research data by integrating Dataverse with Harvard’s institutional repository Digital Access to Scholarship at Harvard (DASH); extending metadata support for astronomy with the Center for Astrophysics (CfA) and biomedical datasets with the Harvard Medical School (HMS); providing university-wide open data awareness and support via the Harvard Open Data Assistance Program (ODAP); make licensed datasets available to the Harvard community (Harvard Subscription Data Dataverse); helping researchers meet the requirements of funder mandated data management plans through customized DMPTool services; and making faculty datasets more widely discoverable by exporting metadata (MARC) into the Harvard Library Catalog, HOLLIS.
Evaluation Method: Site metrics to measure if there is an increase in usage, which includes number of new datasets, and dataset downloads and views.
Team Teaching, Humor, and Informal Polling Techniques in NECDMC - Based Research Data Management Workshops at Brandeis University
Objective: This poster will detail a few pedagogical techniques incorporated into a series of workshops on research data management. These techniques were chosen to better engage workshop participants by making the material more individually relevant and relatable.
Methods: For the second semester in which workshops on research data management (derived from the NECDMC modules) were offered through the Brandeis University Library & Technology Services department, the librarian sought to make the sessions more engaging through several pedagogical techniques. A team teaching approach was employed by inviting senior members of the Technology Help Desk and Hardware Repair Shop to collaborate on and co-teach the workshops. Humor was employed strategically through icons, disaster stories, and select xkcd webcomic strips within the lecture and slides. Informal polling and direct encouragement to share personal anecdotes during the workshop sessions promoted active engagement.
Results: Workshop participants were visibly more engaged, asked a greater number of questions, and questions were more directly relevant to the presented material than participants of the first semester's workshops, before these pedagogical techniques were employed.
Conclusions: Active and engaged learning techniques are difficult to employ in what are essentially one-shot hour-long instruction sessions, particularly when participants are largely unfamiliar with the material at hand. But by integrating some techniques to capture the interest and encourage participants to make connections to their own experiences we observed a deeper understanding and appreciation of the material.
Objective: Building awareness of new library services on any campus can be difficult, especially when these services are deemed “non-traditional.” To help us overcome this challenge, the Boston University Libraries’ partnered with the Mozilla Science Lab to launch a new initiative called “Study Group” on campus. This poster describes Mozilla’s community building philosophy, the initial results of this partnership, and the technologies we have used.
Methods: The Mozilla Science Lab launched its Study Group initiative during the spring of 2015 to help researchers practice open science through community-led, technology-driven workshops. Group members lead each workshop in an informal, approachable way that encourages members to be both teachers and learners. This approach has created a venue for librarians to engage with graduate students as peers and has opened new two-way communication channels. Additionally, by using open technologies advocated by Mozilla, like GitHub and Gitter, the library now engages researchers on the platforms they already prefer.
Results: Our first two events this spring had ten and twelve participants respectively and we have another ten events scheduled. Last fall we held six events with a total of fifty-five participants. This spring we have held nine events with a total of forty-four participants. Of note, library staff have led only two sessions. This is an important achievement because it has limited our investment in staff time while still allowing us to achieve beneficial outreach results. Less tangibly, participants are beginning to view the library as a partner in open research, as a resource for data sharing, and as a more technology-driven organization. Finally, outside of staff time our total investment (including launching bu.edu/study) has been $240 – predominantly for posters and other outreach materials.
Conclusions: Partnering with the Mozilla Science Lab has helped the library engage with graduate researchers in the sciences through community building in a peer-to-peer format. This relationship has helped both the BU Libraries and Mozilla achieve their respective goals to engage researchers in open research practices in a mutually beneficial way.
Objectives: The Research Data Management Roundtables are a collaborative effort to address the shared challenges relating to data services and the role of the library. Outside of listservs, conferences, or other multi-day events, there are few means to discuss issues related to beginning or sustaining data management initiatives among organizations, libraries, and librarians. The Roundtables are one method to address this issue.
Methods: The New England eScience Program (NN/LM NER) sponsored two Roundtable discussions in 2015. Each was a one day event, and included an activity based in the local area and the Roundtable discussion. Around twenty librarians from multiple New England institutions met at central locations in Amherst and Worcester, Massachusetts. The committee members facilitated the discussions with guidelines and pre-determined topics.
Results: “Outcomes” from the two discussions were posted to the e-Science Portal for New England Librarians Community Blog. The first Roundtable focused on “organizational structures for research data management services at our institutions,” and the second discussed “engaging faculty and graduate student researchers at our institutions.” Feedback on the most useful aspects of the Roundtables included: “questions were very helpful to guide the discussion”; “hearing other librarians’ ideas and successes, and commiserating about weaknesses and challenges”; and “learning I’m not the only one with questions.”
Conclusions: The Roundtables have been successful professional development for librarians to share experiences and training with other librarians, in order to broaden their knowledge about RDS. Suggestions for the future include discussions on: data policies, policy development, and examples; how to gain traction with administration; and collaborating across campus.
Objective: New York University (NYU) Libraries provide research data services to diverse communities across several campuses. Until recently, they have worked mostly independent of each other. At the main campus, NYU Data Services offers workshops, individual and group consultations, and traveling “road shows” on data management to the larger NYU community. At a separate medical center campus, the NYU Health Sciences Library (NYUHSL) supports a data catalog, data management education, and individualized lab support. Finally, Databrary, which is connected to NYU’s Digital Library Technology Services, provides a repository for behavioral and learning science researchers working primarily with video data to store, manage, and share the raw materials of their work with their colleagues. This poster will discuss how these disparate services have worked more closely together by identifying overlap, making connections between service offerings, and sharing knowledge and resources around data. This initiative better enriches the overall mission and strategy of NYU libraries to serve its student and research communities.
Methods: To ensure the better coordination of these data services, we began to hold regular, bi-monthly meetings to discuss strategies for improving data education material, integrating an institutional data catalog created by NYUHSL with main campus systems, and providing data-related outreach to institutional stakeholders. These groups have also collaborated on planning and hosting events on data-related topics including using Databrary, reproducibility in science, and data visualization. Finally, a resource sharing system was instituted across campuses for library faculty to collaborate and improve upon the instructional design of data management education, create outreach materials, and share ongoing project documentation.
Results: The new collaboration between NYU Data Services, NYUHSL and special projects like Databrary has served to break down existing institutional silos to provide better research and educational data services to NYU’s student and research communities. This collaboration has been essential for improving upon existing services, identifying new opportunities to support the data needs of institutional stakeholders, and providing increased levels of outreach. By fostering a better understanding of what data services are available across campuses through this ongoing collaboration, we are better able to identify and support our communities’ data needs.
Conclusion: Providing data management, curation, and storage services for a diverse and dynamic research community on campus is a demanding task that requires a distributed effort. Each service fills different gaps for researchers at varying stages of their research practices, though without inter-department communication there was decidedly less impact and reach by everyone. By collaborating and opening a line of communication, we have built a better understanding of how we can interact to provide stronger support to the student and research communities across campuses.
Objective: The Princeton Plasma Physics Laboratory (PPPL) is a Department of Energy (DOE) Laboratory operated by Princeton University. The DOE released a Public Access plan in October of 2014 in response to the White House Office of Science and Technology Policy (OSTP) memo from February of 2013 titled “Increasing Access to the Results of Federally Funded Scientific Research”. To determine how the Princeton University Library and Office of Information Technology can help PPPL comply with the DOE Public Access Plan.
Methods: Several meetings were held between the PPPL committee charged with Public Access Plan compliance, the E-science and Scholarly Communications Librarians, and Princeton’s Dspace repository architect and programmer. These meetings served to help librarians provide PPPL with information about OSTP and DOE requirements and to determine how the Library and OIT could help PPPL with compliance.
Results: Librarians helped align the PPPL Data Management Plan with DOE requirements. Librarians also helped PPPL’s communications and technology licensing understanding of Open Access to published research and consult on publisher copyright policies. The E-Science Librarian, DSpace repository architect, and DSpace programmer worked with PPPL to set up a DSpace community, associated metadata, and guidelines for documentation to make the data underlying publications publicly accessible.
Conclusion: Many U. S. funding agencies have released the Public Access plans in response the OSTP Memo from February 2013 with requirements for open access to articles, data management plans, and making data underlying publications publicly available. Librarians can play a role in helping their institutions respond to these requirements.
Three years ago, the Office of Science and Technology Policy released the memo “Increasing Public Access to the Results of Federally Funded Research.” So far, 16 agencies have released plans. These new requirements relate to information access so librarians are well placed to help researchers and grants administrators comply. Many librarians have previous experience with NIH Public Access Policy and/or NSF data management plan requirements, so the transition to the new mandates should be easy. This breakout session will help you focus your efforts on the most important aspects of public access and data management plans when helping researchers with compliance.
Margaret Henderson is Director of Research Data Services and Hillary Miller is Scholarly Communications Outreach Librarian, Virginia Commonwealth University Libraries.
In this session Leah will discuss her experiences working on an NIH Supplement for Informationist Services grant, what was accomplished, and what she learned along the way. Within the psychiatric neuroimaging research community, data and resource sharing have become accepted as standard, but issues related to attribution and citing data in novel research are still hindering meaningful reuse. This project aimed to illustrate a system of data identification that would not only allow for proper citation of whole datasets, but maintain the chain of attribution in derived and remixed datasets, allowing for a more complete picture of research impact and author contribution.
Leah Honor is Library Fellow and Informationist Liaison to the Child and Adolescent Neurodevelopment Initiative, University of Massachusetts Medical School.
Researchers are under increasing pressure to manage, organize, describe and document their data in ways that enable others to discover, understand and reuse their work. However, the knowledge and skills needed to be successful in these tasks are not often a part of a student's education in college or graduate school. Librarians have an opportunity to address this gap in student's education through developing data literacy programming, but developing effective data literacy programs can seem daunting.
This session will introduce students to a model for creating data literacy programming developed as a part of the Data Information Literacy project. We will begin by reviewing the findings from interviews conducted with faculty and students at four universities. We will then walk through the DIL model step by step. Finally, participants will work through case studies to explore potential opportunities and generate possible approaches to offering data literacy programs.
Jake Carlson is Research Data Services Manager, University of Michigan.
Data repositories: the answer that actually came with a question. Funders, journal publishers, and disciplinary societies recognize the benefits of long-term access to valuable data that could validate results, increase scholarly democracy, or possibly lead to future discoveries. With this in mind, a majority of research now being done in academia is subject to data sharing requirements that the underlying data be publicly accessible, citable, and persevered. As many subject-based data repositories help make this happen, particularly for computing-intensive disciplines with shared infrastructure, such as high-energy physics or real-time climate monitoring, who will manage the "long-tail" of smaller or multi-disciplinary research data?
Our institutional repositories (IR) could be the answer. With a few key policy decisions, and robust review and curation procedures, libraries are well-positioned to help researchers comply with mandates to share and archive their data. Whether you use Hydra, DSpace, Fedora, E-prints, or Digital Commons, this talk will outline important issues to consider as you build new capacity with existing IR infrastructure or a custom data repository, including staffing, curation procedures, and metadata and documentation requirements. Finally it will explore the results and faculty response to launching the Data Repository for the University of Minnesota in 2015, which is based on the Libraries’ existing IR service. Our data submission process, curation procedures, faculty usage, and lessons learned will be placed in context of our broader data management and curation program.
Lisa Johnston is Research Data Management/Curation Lead and Co-Director of the University Digital Conservancy, University of Minnesota.
Breakout Session Descriptions: 2016 University of Massachusetts and New England Area Librarian e-Science Symposium
Descriptions of the Breakout Sessions held at the 8th annual University of Massachusetts and New England Area Librarian e-Science Symposium, held Wednesday, April 6, 2016 at the University of Massachusetts Medical School, Worcester, MA. Sessions include Data Information Literacy, Compliance, Data Repositories, and Informationist.
Kendall Roark, PhD, is Assistant Professor and Research Data Specialist at Purdue University. In her keynote presentation, she provides a broad perspective on the research data management services that U.S. and Canadian libraries are implementing.