In 2019, our team began work on the Center for Cancer Data Harmonization (CCDH) as part of a 3 ½-year, $8.8 million contract awarded by the Frederick National Laboratory for Cancer Research on behalf of the National Cancer Institute (NCI). Our PCDC staff are an integral part of a multi-disciplinary, multi-institutional team of experts in cancer, data standards, and technology who will design and develop the Center. The aim of the CCDH is to improve interoperability among cancer data repositories and resources; this will make it possible for researchers to ask more sophisticated questions across a broader range of data from different sources. The CCDH is currently in development and is projected to launch in 2021.
Our successes and lessons learned building the PCDC enable our team to contribute to the CCDH project in multiple areas. The project focuses on five key areas: community development, data model harmonization, ontology and terminology ecosystem, software tools and data quality, and program management. Our team co-leads two of these five key workstreams: Community Development and Data Model Harmonization.
The Community Development Workstream is crucial for assessing the current state of cancer data systems and needs of researchers as the CCDH works to facilitate an interoperable ecosystem. To develop a deeper understanding of the existing cancer data landscape and identify opportunities for improvement, this team is conducting focus group interviews with representatives from core US cancer data repositories and commons to gather specifications and requirements that will inform the development of the Center’s resources. Later in the project, this group will liaise between the CCDH team and the project stakeholders by providing training, help desk services, and ongoing project support once the Center has launched.
The Data Model Harmonization Workstream, co-led by Brian Furner, will make it possible for the CCDH to bring disparate data across various data commons (e.g., the Genomic Data Commons, the Human Tumor Atlas Network, the Imaging Data Commons, the Proteomics Data Commons, the Integrated Canine Data Commons, etc.) together under a standardized, interoperable data model. This team works closely with the Tools and Data Quality Workstream to plan, develop and implement tools to align the data models of the various commons. This work is critical to create a global data ecosystem that aligns the diverse perspectives of project stakeholders. The data model harmonization group will also lead testing of the harmonized model among the groups.
University of Chicago
Oregon State University
Oregon Health and Sciences University
Johns Hopkins University
University of North Carolina
PCDC CCDH Team
Community Development Workstream: The PCDC team is working with CCDH leadership and project managers on the content and design for the CCDH website, which will provide information, resources, and support for cancer data nodes and researchers.
Data Model Harmonization Workstream: The team has documented and aligned data dictionaries for the Genomic Data Commons, Proteomic Data Commons, and Integrated Canine Data Commons and created an Aggregated Data Model (ADM). Work is currently being done to map the ADM to BRIDG.
Community Development Workstream: A first round of interviews with CCDH nodes has been completed with Data Commons Framework (DCF), Human Tumor Atlas Network (HTAN), Integrated Canine Data Commons (ICDC), Imaging Data Commons (IDC), Genomic Data Commons (GDC), and Proteomics Data Commons (PDC).
Data Model Harmonization Workstream: Efforts have been focused on close review of the following Cancer Research Data Commons (CRDC) node data models: GDC, PDC, and ICDC. This work will result in consistently formatted data model artifacts for the CRDC nodes that will facilitate production of the harmonized CRDC-H model, a key deliverable for the first phase of the CCDH contract.