Data for the Common Good applies our expertise in data sharing, as well as the streamlined and scalable infrastructure and processes that we have developed for the Pediatric Cancer Data Commons, to other areas where we can make a difference. Our portfolio of projects includes major government initiatives, tools for patients and clinicians, data commons for rare diseases, and more. With our unique approach to data sharing, which prioritizes relationship-building, data quality, and sustainability, we are proud to work in partnership with researchers, clinicians, and patients to drive new science and improve lives.
Data Commons
PREDICT
Monogenic diabetes consortium and data commons
Monogenic diabetes, a subtype of diabetes caused by changes to a single gene, represents 1-4 percent of cases of diabetes in the US. Due to its rarity, a single source for patient data will be a critical resource for researchers to advance science and clinical practice. Working closely with the University of Chicago Kovler Diabetes Center, PREDICT (PREcision DIabetes ConsorTium) has brought together stakeholders from more than a dozen institutions to build a commons that will include clinical data, patient-reported outcomes, and data from wearable devices such as continuous glucose monitors. The consortium is applying D4CG methods to developing a data dictionary and implementing governance structures.
Monogenic Epilepsy Data Commons
EEG data commons for pediatric monogenic epilepsies
A new initiative launched in summer 2024 is extending D4CG data commons development to the field of pediatric monogenic epilepsies. With funding from the Chan Zuckerberg Initiative, we are poised to create the world’s largest data commons for these rare and challenging epilepsies caused by individual gene mutations. This project is bringing together medical centers, registries, and patient advocacy groups to collect and harmonize clinical, genomic, electroencephalogram (EEG), and other types of data, addressing the urgent need for a unified, high-quality, and highly-annotated data source.
Partnering with Dr. Doug Nordli, a leading pediatric epilepsy expert and co-director of the University of Chicago’s Comprehensive Epilepsy Center, D4CG will leverage the University’s expertise in the quantitative analysis of EEGs to build a data commons using the same methodologies that were successful in building the PCDC. This effort will integrate raw and analyzed EEG data with genomic information and crucial clinical details to enhance the richness of these data sources and open new research avenues. In future stages, we plan to collect cognitive and outcome measures, as well as treatment data via linkage with electronic health records, in order to further increase the value of the data commons and expand the scope of research it can support.
Sociome Data Commons
Studying the social determinants of health
The Sociome refers to the non-clinical aspects of life affecting health: social, environmental, behavioral, psychological, and economic factors. Integrating these social determinants of health with clinical data can create new opportunities for valuable research, but requires that they be comprehensively collected in a way that is suitable for large-scale analysis. D4CG is part of a multidisciplinary consortium in Chicago working to build the Sociome Data Commons, a resource that will allow researchers to integrate the social context of disease with clinical and genomic data to better understand, predict, and treat numerous conditions and improve human health.
Learn more at the Institute for Translational Medicine (ITM)
Tools for Patients and Clinicians
GEARBOx
Clinical trials matching tool
As part of The Leukemia & Lymphoma Society (LLS) PedAL initiative, a pillar of The Dare to Dream Project, we developed a clinical trials matching tool for clinicians to rapidly and accurately match children with relapsed acute myeloid leukemia to targeted treatments in North America. GEARBOx, launched in 2022, is a web-based tool that uses a matching algorithm to identify potentially appropriate clinical trials based on COG eligibility criteria and the patient’s clinical data, immunophenotype, and genomic profile. We are now working on improving GEARBOx by adding new features, integrating more trials, and extending this tool to additional tumor types.
Bright Side Navigator
Support for pediatric cancer survivors
D4CG and Team Bright Side have partnered on a survivorship initiative, Bright Side Navigator, that aims to address the challenges for childhood cancer survivors in their follow-up care throughout their lives. The goal of this research study is ultimately to create a widely-available ecosystem for patients, providers, and researchers to access a wide range of data and recommendations for care for survivors of cancer. In this ecosystem, cancer survivors will have continuous access to their treatment and follow-up data, with the ability to share data on demand with clinicians. In a later phase of the project, a de-identified version of the data could be added to a long-term follow-up data commons to enable researchers greater access to these valuable data. The first phase of the Bright Side Navigator project is beginning in autumn 2024 and consists of a interview study to better understand the experiences of pediatric cancer survivors and their families.
Advances in Data Science
RESILIENT
Extracting survivorship insights from electronic health records
RESILIENT (Record Extraction and Survivorship Insights Leveraging Integrated EHR and NLP Technologies) is a three-year project focused on using electronic health records (EHR) and natural language processing (NLP) tools to address survivorship needs. Funded by a Hyundai Hope on Wheels Survivorship Collaboration Award, the project will include three phases. First, we will develop infrastructure to transfer EHR data of childhood cancer survivors via patient access APIs—the same connections used by Epic’s MyChart and other patient-facing platforms. Next, in collaboration with Dr. Guergana Savova, we will use the text processing tool DeepPhe to extract and summarize treatment exposure data like chemotherapy, radiation, and surgery from free-text notes in the EHR. Finally, we will make the processed data available in a structured format for use by other platforms, such as research registries and digital tools for survivors like Passport for Care. Through the RESILIENT project, we have the opportunity to reduce the need for manual data entry into these important downstream platforms and improve clinical care and research for pediatric cancer survivors.
National Cancer Institute Projects
D4CG and the PCDC are an important part of a national data sharing ecosystem through the National Cancer Institute’s Childhood Cancer Data Initiative (CCDI). We are proud to contribute to the CCDI’s efforts to improve cancer prevention, treatment, quality of life, and survivorship and to ensure that researchers learn from every child with cancer.
CCDI Data Federation
With a collaborative group of institutions, we are working to make the multiple data commons involved in CCDI able to interoperate by developing and implementing a common harmonized data model and API for queries across the PCDC, St. Jude, Seven Bridges, and the Gabriella Miller Kids First Data Resource Center.
We have received $722,292 in funding for this project, 100% of which is financed with federal money.
C3DC
We are participating in developing the Childhood Clinical Data Commons (C3DC), a data node of the CCDI that will act as the primary source of individual-level data describing participants’ demographic and clinical characteristics. C3DC will interoperate with other CCDI data type-specific nodes such as genomics, imaging, and proteomics.
We have received $1,039,349 in funding for this project with an Option Period to extend for additional funding in an amount of $823,921. The total anticipated budget for this project is $1,863,270, 100% of which is financed with federal money.
Past Projects
CCDH
The Center for Cancer Data Harmonization (CCDH) was developed to drive the interoperability and accessibility of the data within the NCI Cancer Research Data Commons (CRDC). From 2019-2021, D4CG co-led the development of the CCDH alongside four other institutions, with a focus on community development and data model harmonization.
PCDC-H
In 2022 we concluded an 18-month project which established the foundations for integrating the PCDC with the NCI Cancer Research Data Commons (CRDC). By developing and mapping data to a common PCDC-H data model aligned with the NCI Thesaurus, we made it possible to link PCDC data with data in other CRDC nodes across the country, creating a robust, integrated resource for pediatric cancer research.
DI-Cubed
In partnership with Leidos Biomedical, in 2019 we developed a process for integrating radiology images into a data commons as part of the NCI Data Integration and Imaging Informatics (DI-Cubed) Project. Our pilot project tested the feasibility of linking image data to clinical data in a commons environment and serving this information to researchers in real time, with the INRG Data Commons serving as the paradigm system.