In 2019, our team began work on the Center for Cancer Data Harmonization (CCDH) as part of a 3 ½-year, $8.8 million contract awarded by the Frederick National Laboratory for Cancer Research on behalf of the National Cancer Institute (NCI). Our PCDC staff are an integral part of a multi-disciplinary, multi-institutional team of experts in cancer, data standards, and technology who will design and develop the Center. The aim of the CCDH is to improve interoperability among the cancer data repositories and resources of the Cancer Research Data Commons, making it possible for researchers to ask more sophisticated questions across a broader range of data from different sources. The CCDH is currently in development and is projected to launch in 2021.
Our successes and lessons learned building the PCDC enable our team to contribute to the CCDH project in multiple areas. The project focuses on five key areas: community development, data model harmonization, ontology and terminology ecosystem, software tools and data quality, and program management. Our team co-leads two of these five key workstreams: Community Development and Data Model Harmonization.
The Community Development Workstream is crucial for assessing the current state of cancer data systems and needs of researchers as the CCDH works to facilitate an interoperable ecosystem. To develop a deeper understanding of the existing cancer data landscape and identify opportunities for improvement, this team is conducting focus group interviews with representatives from core US cancer data repositories and commons to gather specifications and requirements that will inform the development of the Center’s resources. Later in the project, this group will liaise between the CCDH team and the project stakeholders by providing training, help desk services, and ongoing project support once the Center has launched.
The Data Model Harmonization Workstream, co-led by Brian Furner, will make it possible for the CCDH to bring disparate data across various data commons (e.g., the Genomic Data Commons, the Human Tumor Atlas Network, the Imaging Data Commons, the Proteomics Data Commons, the Integrated Canine Data Commons, etc.) together under a standardized, interoperable data model. This team works closely with the Tools and Data Quality Workstream to plan, develop and implement tools to align the data models of the various commons. This work is critical to create a global data ecosystem that aligns the diverse perspectives of project stakeholders. The data model harmonization group will also lead testing of the harmonized model among the groups.
University of Chicago
Oregon State University
Oregon Health and Sciences University
Johns Hopkins University
University of North Carolina
PCDC CCDH Team
This quarter, the CCDH team focused efforts on relationship building, including determining areas of collaboration and crossover with the newly awarded Cancer Data Aggregator team. Working with representatives from CCDH nodes and work groups, they are compiling a landscape analysis and identifying requirements for data flows in order to provide recommendations for future Cancer Research Data Commons architecture.
Community Development Workstream: This group launched a quarterly newsletter to keep CCDH collaborators up to date on progress. Work on the CCDH web portal, which will provide important information, resources, and tutorials for nodes, continues.
Data Model Harmonization Workstream: This team continues to develop the CRDC Conceptual Data Model into an implementable model, and has begun collaborative meetings with the Cancer Data Aggregator team to ensure work is integrated across CRDC projects. They also conduct biweekly office hours focusing on questions and topics submitted by community members.
Community Development Workstream: Work on the CCDH web portal, which will provide important information, resources, and tutorials for nodes, continues. The PCDC team is developing a communications plan to keep stakeholders engaged and informed, beginning with a quarterly newsletter.
Data Model Harmonization Workstream: This group is working to define a shared data model across the CRDC nodes, leveraging terminological and modeling standards where possible. The initial phase of this work was completed in May 2020 in the form of a prototype called the CRDC Conceptual Domain Model (CDM). This prototype offers a more normalized and standards-aligned representation of the content that has been collected from the CCDH community.
Community Development Workstream: The PCDC team is working with CCDH leadership and project managers on the content and design for the CCDH website, which will provide information, resources, and support for cancer data nodes and researchers.
Data Model Harmonization Workstream: The team has documented and aligned data dictionaries for the Genomic Data Commons, Proteomic Data Commons, and Integrated Canine Data Commons and created an Aggregated Data Model (ADM). Work is currently being done to map the ADM to BRIDG.
Community Development Workstream: A first round of interviews with CCDH nodes has been completed with Data Commons Framework (DCF), Human Tumor Atlas Network (HTAN), Integrated Canine Data Commons (ICDC), Imaging Data Commons (IDC), Genomic Data Commons (GDC), and Proteomics Data Commons (PDC).
Data Model Harmonization Workstream: Efforts have been focused on close review of the following Cancer Research Data Commons (CRDC) node data models: GDC, PDC, and ICDC. This work will result in consistently formatted data model artifacts for the CRDC nodes that will facilitate production of the harmonized CRDC-H model, a key deliverable for the first phase of the CCDH contract.