The Pediatric Cancer Data Commons (PCDC) brings together clinical, genomic, and imaging data from institutions around the world that are working alongside us to transform pediatric cancer research and outcomes.

Headquartered at the Center for Research Informatics (CRI) at the University of Chicago, the PCDC works with international leaders in pediatric cancers and the National Cancer Institute to develop and apply uniform data standards, facilitating the collection, combination, and analysis of data from many different sources. By harmonizing existing clinical research data and leading international efforts to standardize data collection, we are breaking down long-standing barriers that have held back advancements in research on rare diseases.

Our ability to streamline the commons development process through centralized expertise in clinical cancer research, technology, and the international legal and regulatory landscape significantly increases the number of cases available for study, expedites the sharing of data across borders, and considerably reduces the time and costs associated with creating data commons for each pediatric cancer for physicians and researchers. Our aim is to leverage this unique collaborative consortium-based approach to enable new and meaningful discoveries about pediatric cancers.

A data what…?

Researchers often store their information in databases or patient registries. These databases are usually siloed sets of a single type of data (e.g., clinical, genomic, pathology, or imaging data) that are stored in one place, such as an Excel spreadsheet. A ​data commons​ is an environment that facilitates the connection and aggregation of multiple disparate types of data into a single cloud-based environment for researchers. Data commons are accessed via ​data platforms​ such as a website or application on a mobile device or computer. Data platforms may also include tools for discovering cohorts of patients, aggregating disparate types of information, and analyzing and visualizing the data. Permissions to see, analyze and sometimes download the data are regulated by governance rules established by each consortium.

Illustration showing multiple different databases interoperating with each other in a data commons, with data displaying on computer and tablet screens via a data platform

Taking a new approach

The PCDC has several distinguishing factors contributing to its success and providing much-needed solutions to challenges and shortcomings in pediatric cancer research.

Icon of a globe
Strength in diversity and numbers

The most significant barrier to large-scale discovery and advancement in pediatric cancers is the lack of data and samples for study. Inability to share data collected during clinical trials is a devastating blow for innovation and discovery. ​We are here to solve this problem​. The PCDC engages researchers around the world and enables them to easily share and harmonize data, perform cohort searches, and request and analyze data. Our expertise in international privacy laws help us streamline and expedite international data sharing agreements. This allows the PCDC to collect data on more patients from more diverse backgrounds, thus helping pediatric cancer researchers obtain larger sample sizes and richer data with which to power their projects.

Icon of a square, circle, and triangle with arrows between them to signify the concept of interoperability
Establishing common data standards

Defining essential data elements and achieving universal data dictionary consensus is essential in creating a sustainable and effective data commons. Once consensus on data standards and definitions has been reached, we are able to harmonize researchers’ retrospective data sets for storage in the commons and researchers are able to collect prospective data in unison according to the adopted standards.

Icon of three person silhouettes with arrows between them
Consortium-based decision making

The PCDC convenes international leaders and cooperative groups studying pediatric cancers. These groups provide disease-specific expertise throughout the development and implementation of data standards and commons. Through our establishment of data governance procedures, we ensure that the commons meet the needs of each group, address previous barriers to successful collaboration, and facilitates trust and consensus among all members.

Consortia

Frequently Asked Questions

What makes the PCDC different from other cancer data commons efforts?

There are a number of cancer data sharing initiatives, each of which has its own priorities and goals to help solve a piece of the cancer puzzle. Several of these are devoted to pediatric cancer. For example, the Gabriella Miller Kids First Data Resource Center has a distinct focus on exploring the genetic underpinnings and connections between pediatric cancers and structural birth defects, while the St. Jude Cloud helps researchers explore and visualize next generation sequencing data for pediatric cancers and other diseases.

We are focused on collecting, standardizing, and harmonizing disparate sources and types of pediatric cancer clinical research data from consortia and research groups around the world. Our PCDC team works with global consortia to create international clinical data standards allowing us to collect and store some of the world’s largest collections of interoperable pediatric cancer clinical data. Furthermore, our goal is to connect these clinical data to the information in other commons, thus enriching the information in those valuable resources. Finally, we aim to make the data research ready by providing researchers with tools to perform cohort discovery, data visualization, and data analysis in real time on the commons platform.

Why do we need a new way of aggregating research data?

While researchers have formed cooperative groups to share and standardize data for several decades, there are several reasons that the scientific community needs a new vision of data sharing.

Manual processes: Currently, clinical research coordinators document and maintain extensive research records in paper binders. Paper documentation and manual extraction into electronic databases is at best inefficient and at worst, error prone and dangerous. Multiple layers of data checking and auditing are required to help ensure completeness and accuracy. Physical storage and archiving of research shadow binders alongside electronic databases is costly and can result in misplaced or lost protected health information (PHI). Let’s face it–pediatric clinical trials are stuck in the 1990s!

Automatic extraction of data from electronic health records into secure databases reduces opportunity for errors in data sets; stores patients’ data more securely; and reduces the workload of clinical research coordinators, allowing them to focus more energy on patient-related research concerns.

Lack of data standards: When data are gathered independently and siloed in individual databases, it can be difficult or impossible to merge the data together. For example, researchers may have different ways of categorizing tumor types and/or sites. If these categories and values cannot be aligned, the data on these variables cannot be compared. Even when data can be combined, the process of harmonizing data from disparate sources retrospectively can be cumbersome and extremely time-consuming. In order to aggregate enough patient data to power studies that are capable of making truly-significant discoveries in pediatric oncology, researchers around the world must work together to establish universal data standards so that they can collect data prospectively in unison. Our UChicago PCDC team is passionate about standards and interoperability.

Differing data models: There are several oncology data models in existence at present. The international pediatric oncology field must work together to create an agreed-upon, comprehensive data model alongside its data standards consensus work so that data from different sources can be unified into a singular location. The UChicago PCDC team is creating a culture of data stewards among the pediatric oncology research community, encouraging everyone to take responsibility for how data are collected, stored, and shared.

International regulations: Significant advances in pediatric cancer research depend on large sample sizes and rich data sets. Now that the necessary technology and clinical research data exist worldwide, researchers have been trying to share data across borders to make more meaningful discoveries together. However, laws and regulations around clinical research data, protected health information (PHI), and data privacy and security make sharing data sets across international boundaries difficult and costly. Research teams enter into data sharing agreements and contracts with the help of their respective legal teams.

Though the creation of cooperative research groups has helped to streamline some of these processes, this work must still be duplicated by researchers in different disease groups. By creating a central pediatric cancer data commons, our team is able to utilize our experts in international data laws and data governance to streamline this process for cooperative groups across multiple pediatric cancers. This centralized expertise not only expedites the laborious process of contract writing, but significantly reduces costs and time for researchers, so that they can focus their time and funds on their research and patients.

How do you ensure your data standards are kept up to date?

Once data standards have been collectively determined by the cooperating groups in a consortium, maintenance of those data standards is crucial to the sustainability of data quality. We have created a change request process through which any contributing institution or group can submit a desired alteration or addition to the data standard. An initial change request form, available through our University of Chicago REDCap platform, provides a uniform method for collecting, processing, and addressing submitted requests.

It takes a village to maintain standards. Our UChicago PCDC team is focused on inclusion and transparency when it comes to data and standards. Rather than select or create a single standard, we work to include as many stakeholders as possible, including the disease groups from around the world, the National Cancer Institute and National Institutes of Health, the Clinical Data Interchange Standards Consortium (CDISC), and the CTSA National Clinical Data to Health (CD2H) program. By encouraging cooperation by these and other standards groups, we hope to build a community of invested researchers and stakeholders who will work to maintain interoperable standards, thus facilitating interoperability of precious clinical and research data from children with cancer.

What are data dictionaries and data models?

A data dictionary is a structured database that is comprised of an exhaustive list of data fields with respective definitions and metadata details (including data type, source, and units of measurement when applicable). Building a standardized set of definitions for a data commons has several benefits. First, the initial process of building a data dictionary allows a group to craft a clear and concise definition of each term, which can stand as the example against which other definitions are compared. Second, the data dictionary assists with data harmonization. When a new group plans to contribute its data to a commons, the data dictionary provides a clear starting point for that group to see how it might best align its data model to that of the commons. Finally, a properly maintained data dictionary will reflect any changes and additions that are made to the data model, serving as a reference for each contributing institution as the data commons evolves.

A data model is an abstract model designed to gather and organize information about data and show this information as data entities and the relationships among them. It shows how data will be stored in a data commons and is informed by the data dictionary.

Creating a standard data model is crucial because it dictates how a data commons and its developers will manage incoming data. Alongside the data dictionary, it is the basis for how developers determine if new data are in the correct format to be added to the commons. Having this diagram, or conceptual map, also aids users. With an accurate data model, researchers and physicians who have been approved to obtain a specific data set from the commons are able to receive their data more quickly and easily. Ultimately, a data model helps a data commons consortium and development team standardize and expedite the process of data acquisition, storage, and retrieval.

How do you keep data safe?

The UChicago Center for Research Informatics has a long history of building and supporting systems to support PHI (protected health information). We have an entire team devoted to making sure that our systems adhere to HIPAA security and privacy regulations. We retain an expert on GDPR (European) privacy rules. Even though most of the data we store in our commons is not PHI, we treat it all as such. Importantly, we have processes in place to ensure that the needs of each contributor can be met by assigning certain rules or embargo periods to data that are submitted. We want every research team to feel comfortable with us as stewards of their data, which is why data are only collected, stored, and distributed according to terms defined in each data use agreement.

Why use a data commons instead of REDCap or other platforms?

We love REDCap and use it for many of our projects at the University of Chicago. As a survey tool, REDCap is a valuable platform for clinical research, and many researchers have been trained to use REDCap and have utilized it in research projects. While REDCap remains useful for researchers in many ways, it has several significant limitations when compared to the power of using a data commons to collect, store, and share research data. First, REDCap is not capable of aggregating disparate types of data (clinical, imaging, genomics, pathology, etc.) together. Second, there is a cap on how much data researchers can store in REDCap, limiting the size of data sets that can be studied. Third, it is labor-intensive to manually enter clinical data into REDCap and creates additional opportunities for user error, compared to automatic extraction of data from electronic health records into a data commons. Lastly, data accrued in REDCap is still siloed – it does not address the longer-term data standardization needs of the research community nor does it eliminate the need for data harmonization.

Beyond REDCap, there are many other new and exciting platforms to collect, store, and share data. We are thrilled to see so many fantastic technologies emerge to help researchers. In many cases, the UChicago PCDC can interoperate with these platforms to share data and tools, especially if these systems have adopted best practices for research data. While there are many ways to leverage data for research, we feel that our approach is unique and important for creating and sustaining a culture of data sharing and interoperability that is so critical for making advances in cancer research. We look forward to working to fight children’s cancer together.

How do collaborators access PCDC data and/or contribute data to the commons?

How can I help support the PCDC?

Thank you for your interest in supporting our work! The PCDC is sustained through research grants, public-private partnerships, and generous philanthropic support from organizations and individuals. Learn more and make a gift here.