Donate Donate
  • About
    • Meet the Team
    • Our History
    • Sponsors
    • Press
    • Careers
    • Annual Report
    • Contact Us
  • PCDC
    • PCDC Consortium
    • Our Progress
    • ALL
    • AML (INTERACT)
    • Germ Cell Tumors (MaGIC)
    • Neuroblastoma (INRG)
    • Soft Tissue Sarcoma (INSTRuCT)
  • Research
    • CCDH
    • LLS PedAL
    • NCI DI-Cubed
  • Education
  • Donate
  • About
    • Meet the Team
    • Our History
    • Sponsors
    • Press
    • Careers
    • Annual Report
    • Contact Us
  • PCDC
    • PCDC Consortium
    • Our Progress
    • ALL
    • AML (INTERACT)
    • Germ Cell Tumors (MaGIC)
    • Neuroblastoma (INRG)
    • Soft Tissue Sarcoma (INSTRuCT)
  • Research
    • CCDH
    • LLS PedAL
    • NCI DI-Cubed
  • Education
  • Donate

homepagegrid

Donate

Thank you for your interest in supporting our work!

The Pediatric Cancer Data Commons project welcomes donations from individuals and organizations. These funds support our team in maintaining the existing commons and consortia, improving and further developing the existing commons, integrating new partners and cancers, training new bioinformaticians and physicians in pediatric cancer informatics, and supporting pediatric cancer researchers and collaborators who contribute and use PCDC data to make meaningful discoveries.

If you are interested in investing in the PCDC at the University of Chicago, you may donate using the button below, contact us directly, or reach out to the Medicine & Biological Sciences Development Office at 773.702.6565.

Make a Gift

Learn more about us:

Connect. Share. Cure.

Research

Pediatric Cancer Data Commons

Read More

Research

How can data commons change how the world does research?

The potential for a commons-based approach to change the landscape of cancer research is explored in two of our recent publications. In “Using big data in pediatric oncology: Current applications and future directions” in Seminars in Oncology, we discuss the uses of big data in pediatric cancer, existing pediatric cancer registry initiatives and research, the challenges in harmonizing data to improve accessibility for study, and the future opportunities we see for innovation in this area. In “Data Commons to Support Pediatric Cancer Research” in the American Society of Clinical Oncology Educational Book, we describe current data commons and how they operate in the oncology landscape, and offer a practical paradigm for developing new commons. By centralizing data, processing power, and tools, there is a valuable opportunity to share resources and thus increase the efficiency, power, and impact of research.

You can follow along with our publications on ResearchGate.

A paradigm for building a pediatric cancer data commons

  1. Engage cooperative group(s)
  2. Define scope
  3. Identify funding source
  4. Identify infrastructure
  5. Engage project team
  6. Identify data sources
  7. Establish governance, create policies and procedures
  8. Create contributor / use agreements
  9. Create standards working group to create data dictionary, map elements
  10. Create database
  11. Build front-end query engine
  12. Create and execute communication and education plans
  13. Create sustainability model
Data Commons to Support Pediatric Cancer Research (full paper)
Using big data in pediatric oncology: Current applications and future directions (full paper)

Data commons power discovery.

Transforming the Way Researchers Share Data: Lessons from the Pediatric Cancer Data Commons
Presented by Sam Volchenboum, MD, PhD

University of Chicago Department of Pediatrics Grand Rounds, November 2020

FAQ for Researchers

How can researchers get data from the PCDC?

The PCDC is designed to protect the data of research subjects while maximizing the benefit to researchers. Any researcher can freely register to use the PCDC. There, they can access aggregate data numbers, and soon visualizations, within the commons platform. Data sets with line-level data can be requested through the commons portals, and this access is governed by the respective data-contributing consortium. Once a request is submitted, the executive committee for the relevant consortium will approve or deny the request and our PCDC staff will follow up with the researcher accordingly.

Can researchers get data from the PCDC across multiple disease groups (e.g., survival data for a genomic finding found in both liquid and solid tumors)?

Yes, with the approval of the executive committee of each relevant disease group consortium. This approval process will be as streamlined as possible; the governance plan currently being developed and refined will include such project requests coming through the PCDC Executive Committee but requiring approval of the individual disease group executive committees. While this may appear onerous, it is the only way to ensure that each disease group retains their autonomy in deciding how their data are used for research. Thus far, the disease consortia have supported this vision.

How do collaborators join the PCDC Consortium?

Contributing data to the PCDC

Collaborators interested in contributing data for a disease area that is already part of the PCDC may contact our team to discuss key governance and data considerations as well as estimated project timelines. If approved, the process of creating, updating, and executing contracts, data sharing agreements, and Memoranda of Understanding (as applicable) typically ranges from one to six months depending on country, consortium, legal teams, and type(s) of data involved. After key governance processes have been completed, the collaborator works with PCDC developers to discuss data harmonization and integration into the commons and to resolve harmonization and quality control concerns. This data harmonization and integration process, dependent upon several factors, often takes roughly one to six months to successfully complete. Adding data to the commons may further depend on funding to support the transformation of data into a common data model.

Adding new cancers to the PCDC

Over time we plan to extend the PCDC to include additional pediatric cancers. Collaborators interested in creating a data commons for a new cancer should contact the PCDC team to learn more about the key steps, timeline, costs, and sustainability considerations. Time from initial discussions to the establishment of a disease-group consortium and creation of a Memorandum of Understanding (MOU) may range from three months to a year. Next, data use agreements (DUAs) are created, a data dictionary is established, and consortium leaders establish working groups to drive ongoing development and productivity. Once DUAs have been signed and the data dictionary has been established, PCDC developers can begin to map and harmonize the data and perform quality control checks. While the PCDC has been able to streamline the process of commons development, the ultimate pace at which a commons is developed and launched depends heavily on the frequency and intensity of involvement from collaborators, the number and diversity of consortium members involved, and the amount of initial funding available. The process from MOU sign-off to commons launch may take as little as six months to a year with substantial dedication from consortium leaders. Additional commons development, consortium meetings, and data integration continue at regular intervals from this point onward.

Can individual researchers or sites contribute data to the PCDC?

To date, the PCDC has only accepted data from multi-site cooperative groups or, in some cases, nationwide research organizations. Because of the time, resources, and effort involved in integrating each entity into the commons, the PCDC primarily engages with cooperative research groups. If you are an individual researcher or research site interested in joining the PCDC, please contact our team to discuss further.

What quality assurance and harmonization measures are taken when data is added to the commons?

Quality assurance is very important to the PCDC, so each data source is vetted for policies and procedures that help ensure high quality. In addition, the PCDC runs a series of QA/QC scripts to confirm internal consistency of the data. Our team works closely with data managers and stakeholders from all groups to help with the QC processes.

How are PCDC consortia formed?

In some cases, pre-established consortia have approached the PCDC to create a disease-specific commons. In other cases, the PCDC works closely with leaders in a specific pediatric cancer type to identify interested cooperative research groups who are willing to form a consortium to guide commons development and steer future progress. This process may take months of emails, calls, and occasional international face-to-face meetings. Eventually, all committed parties sign a Memorandum of Understanding and establish the necessary contracts, data sharing agreements, and governance mechanisms to guide the consortium moving forward.

Will non-pediatric cancer disease groups be included in the PCDC?

At the moment, the PCDC is focused on pediatric cancers. We recognize, however, that this approach to building commons is applicable across many other disease groups, especially other rare diseases. We encourage researchers working in other cancers or in rare disease specialties to contact us to discuss future opportunities and solutions.

Read More

Pediatric Cancer Data Commons

The Pediatric Cancer Data Commons (PCDC) brings together clinical, genomic, and imaging data from institutions around the world that are working alongside us to transform pediatric cancer research and outcomes.

Headquartered at the University of Chicago, the PCDC works with international leaders in pediatric cancers and the National Cancer Institute to develop and apply uniform data standards that facilitate the collection, combination, and analysis of data from many different sources. By harmonizing existing clinical research data and leading international efforts to standardize data collection, we are breaking down long-standing barriers that have held back advancements in research on rare diseases.

We streamline the process of bringing data together into a commons through our centralized expertise in clinical cancer research, technology, data standards, and the international legal and regulatory landscape. The result is significantly more cases available for study, expedited sharing of data across borders, and reduced time and cost for physicians and researchers. Our aim is to leverage our unique collaborative, consortium-based approach to enable new and meaningful discoveries about pediatric cancers.

In autumn 2019, thanks to funding from St. Baldrick’s Foundation, we began bringing together all of our commons-building work to form the Pediatric Cancer Data Commons Consortium. The PCDC Consortium is now developing a common core data dictionary and common governance structure spanning seven pediatric cancers: neuroblastoma, soft tissue sarcoma, acute myeloid leukemia, acute lymphoblastic leukemia, germ cell tumors, bone tumors, and Hodgkin lymphoma. This work will enable innovative cross-disease research as well as set a standard for future cancer data commons endeavors.

A data what…?

Researchers often store their information in databases or patient registries. These databases are usually siloed sets of a single type of data (e.g., clinical, genomic, pathology, or imaging data) that are stored in one place, such as an Excel spreadsheet. A ​data commons​ is an environment that facilitates the connection and aggregation of multiple disparate types of data into a single cloud-based environment for researchers. Data commons are accessed via ​data platforms​ such as a website or application on a mobile device or computer. Data platforms may also include tools for discovering cohorts of patients, aggregating disparate types of information, and analyzing and visualizing the data. Permissions to see, analyze and sometimes download the data are regulated by governance rules established by each consortium.

Illustration showing multiple different databases interoperating with each other in a data commons, with data displaying on computer and tablet screens via a data platform

Taking a new approach

The PCDC has several distinguishing factors contributing to our success and providing much-needed solutions to challenges and shortcomings in pediatric cancer research.

Icon of a globe
Strength in numbers–and diversity

The most significant barrier to large-scale discovery and advancement in pediatric cancers is the lack of data and samples for study. For these rare diseases, inability to share data collected during clinical trials is a devastating blow for innovation and discovery. ​We are here to solve this problem​. The PCDC engages researchers around the world and enables them to easily share and harmonize data they’ve collected, perform cohort searches, and request and analyze data from the commons. Our expertise in international privacy laws help us streamline and expedite international data sharing agreements. This allows the PCDC to collect data on more patients from more diverse backgrounds, thus helping pediatric cancer researchers obtain larger sample sizes and richer data with which to power their work.

Icon of a square, circle, and triangle with arrows between them to signify the concept of interoperability
Establishing common data standards

Defining essential data elements and achieving universal data dictionary consensus is essential to creating a sustainable and effective data commons. Once consensus on data standards and definitions has been reached, we are able to harmonize researchers’ retrospective data sets for storage in the commons and researchers are able to collect prospective data in unison according to the adopted standards.

Icon of three person silhouettes with arrows between them
Consortium-based decision making

The PCDC commons development process convenes international leaders and cooperative groups studying each pediatric cancer we address to provide disease-specific expertise throughout the development and implementation of their data standards and commons. Through robust data governance procedures, we ensure that the commons we develop meet the needs of each group, address any previous barriers to successful collaboration, and facilitate trust and consensus among all members.

PCDC News

Get involved and follow our progress with updates from our quarterly newsletter.
Sign up for our email list here, and check out past issues below.

January 2021
October 2020
July 2020
April 2020
January 2020

Consortia

ALL Data Commons

INTERACT

INSTRuCT

INRG

MaGIC

Frequently Asked Questions

What makes the PCDC different from other cancer data commons efforts?

There are a number of cancer data sharing initiatives, each of which has its own priorities and goals to help solve a piece of the cancer puzzle. Several of these initiatives are, like us, devoted to pediatric cancer. For example, the Gabriella Miller Kids First Data Resource Center has a distinct focus on exploring the genetic underpinnings and connections between pediatric cancers and structural birth defects, while the St. Jude Cloud helps researchers explore and visualize next-generation sequencing data for pediatric cancers and other diseases.

Several factors make the PCDC unique. Our focus is on collecting, standardizing, and harmonizing disparate sources and types of pediatric cancer clinical research data from consortia and research groups around the world. Our team works with global consortia to create international clinical data standards allowing us to gather and store some of the world’s largest collections of interoperable pediatric cancer clinical data. Furthermore, our goal is to connect these clinical data to the information in other commons, thus enriching the information in those valuable resources. Finally, we aim to make PCDC data research-ready by providing researchers with tools to perform cohort discovery, data visualization, and data analysis in real time on the commons platform.

Why do we need a new way of aggregating research data?

While researchers have formed cooperative groups to share and standardize data for several decades, there are several reasons that the scientific community now needs a new vision for data sharing.

Manual processes: Currently, clinical research coordinators document and maintain extensive research records in paper binders. Paper documentation and manual extraction into electronic databases is at best inefficient and at worst error-prone and dangerous. Multiple layers of data checking and auditing are required to help ensure completeness and accuracy. Physical storage and archiving of research shadow binders alongside electronic databases is costly and can result in misplaced or lost protected health information (PHI). Let’s face it: pediatric clinical trials are stuck in the 1990s. Automatic extraction of data from electronic health records into secure databases reduces opportunity for errors in data sets; stores patients’ data more securely; and reduces the workload of clinical research coordinators, allowing them to focus more energy on patient-related research concerns.

Lack of data standards: When data are gathered independently and siloed in individual databases, it can be difficult or impossible to merge the data together. For example, researchers may have different ways of categorizing tumor types and sites. If these categories and values cannot be aligned, the data on these variables cannot be compared. Even when data can be combined, the process of harmonizing data from disparate sources retrospectively can be cumbersome and extremely time-consuming. In order to aggregate enough patient data to power studies that are capable of making truly significant discoveries in pediatric oncology, researchers around the world must work together to establish universal data standards so that they can collect data prospectively in unison. Our team is passionate about standards and interoperability.

Differing data models: There are several oncology data models in existence at present. The international pediatric oncology field must work together to create an agreed-upon, comprehensive data model alongside its data standards consensus work so that data from different sources can be unified in a singular location. The PCDC is working to create a culture of data stewardship in the pediatric oncology research community, encouraging everyone to take responsibility for how data are collected, stored, and shared.

International regulations: Significant advances in pediatric cancer research depend on large sample sizes and rich data sets. Now that the necessary technology and clinical research data exist worldwide, researchers have been trying to share data across borders to make more meaningful discoveries together. However, laws and regulations around clinical research data, protected health information (PHI), and data privacy and security make sharing data sets across international boundaries difficult and costly. Research teams enter into data sharing agreements and contracts with the help of their respective legal teams. Though the creation of cooperative research groups has helped to streamline some of these processes, this work must still be duplicated by researchers for each disease being studied. By creating a central pediatric cancer data commons, our team is able to utilize our experts in international data laws and data governance to streamline this process for cooperative groups across multiple pediatric cancers. This centralized expertise expedites the laborious process of contract writing and significantly reduces costs and time for researchers, so that they can focus their time and funds on their research and patients.

How do you ensure your data standards are kept up to date?

Once data standards have been collectively determined by the cooperating groups in a consortium, maintenance of those data standards is crucial to the sustainability of data quality. We have created a change request process through which any contributing institution or group can submit a desired alteration or addition to the data standard. An initial change request form, available through our University of Chicago REDCap platform, provides a uniform method for collecting, processing, and addressing submitted requests.

It takes a village to maintain standards. Our efforts are focused on inclusion and transparency when it comes to data and standards. Rather than select or create a single standard, we work to include as many stakeholders as possible, including disease groups from around the world, the National Cancer Institute (NCI) and National Institutes of Health (NIH), the Clinical Data Interchange Standards Consortium (CDISC), and the CTSA National Clinical Data to Health (CD2H) program. By encouraging cooperation between these and other standards groups, we hope to build a community of invested researchers and stakeholders who will work to maintain shared standards, thus facilitating interoperability of precious clinical and research data from children with cancer. Our commitment to data standards and interoperability is nationally recognized, and we are currently leveraging our expertise in this area to help lead the development of the NCI’s Center for Cancer Data Harmonization, a nationwide effort to make cancer research data more accessible, organized, and powerful.

What are data dictionaries and data models?

A data dictionary is a structured database composed of an exhaustive list of data fields with respective definitions and metadata details (including data type, source, and units of measurement when applicable). Building a standardized set of definitions for a data commons has several benefits. First, the initial process of building a data dictionary allows a group to craft a clear and concise definition of each term, which can stand as the example against which other definitions are compared. Second, the data dictionary assists with data harmonization. When a new group plans to contribute its data to a commons, the data dictionary provides a clear starting point for that group to see how it might best align its data model to that of the commons. Finally, a properly maintained data dictionary will reflect any changes and additions that are made to the data model, serving as a reference for each contributing institution as the data commons evolves.

A data model is an abstract model designed to gather and organize information about data and show this information as data entities and the relationships among them. It shows how data will be stored in a data commons and is informed by the data dictionary.

Creating a standard data model is crucial because it dictates how a data commons and its developers will manage incoming data. Alongside the data dictionary, it is the basis for how developers determine whether new data are in the correct format to be added to the commons. Having this diagram, or conceptual map, also aids users. With an accurate data model, researchers and physicians who have been approved to obtain a specific data set from the commons are able to receive their data more quickly and easily. Ultimately, a data model helps a data commons consortium and development team standardize and expedite the process of data acquisition, storage, and retrieval.

How do you keep data safe?

The University of Chicago Center for Research Informatics, which maintains the PCDC’s technical infrastructure, has a long history of building and supporting systems to be used for protected health information (PHI). We have an entire team devoted to making sure that our systems adhere to HIPAA security and privacy regulations. We retain an expert on GDPR (European) privacy rules. Even though most of the data we store in our commons is not PHI, we treat all of it as such. Importantly, we have processes in place to ensure that the needs of each contributor can be met by assigning certain rules or embargo periods to data that are submitted. We want every research team to feel comfortable with us as stewards of their data, which is why data are only collected, stored, and distributed according to terms defined in each data use agreement.

Why use a data commons instead of REDCap or other platforms?

We love REDCap and use it for many of our projects at the University of Chicago. As a survey tool, REDCap is a valuable platform for clinical research, and many researchers have been trained to use REDCap and have utilized it in research projects. While REDCap remains useful for researchers in many ways, it has several significant limitations when compared to the power of using a data commons to collect, store, and share research data. First, REDCap is not capable of aggregating disparate types of data (clinical, imaging, genomics, pathology, etc.) together. Second, there is a cap on how much data researchers can store in REDCap, limiting the size of data sets that can be studied. Third, it is labor-intensive to manually enter clinical data into REDCap and creates additional opportunities for user error, compared to automatic extraction of data from electronic health records into a data commons. Finally, data accrued in REDCap is still siloed – it does not address the longer-term data standardization needs of the research community nor does it eliminate the need for data harmonization.

Beyond REDCap, there are many other new and exciting ways to collect, store, and share data. We are thrilled to see so many fantastic technologies emerge to help researchers. In many cases, the PCDC can interoperate with these platforms to share data and tools, especially if these systems have adopted best practices for research data. While there are many ways to leverage data for research, we feel that our approach is unique and important for creating and sustaining the culture of data sharing and interoperability that is so critical for making advances in cancer research. We look forward to working to fight children’s cancer together.

How do collaborators access PCDC data and/or contribute data to the commons?

Learn about how you can participate in the PCDC project here.

How can I help support the PCDC?

Thank you for your interest in supporting our work! The PCDC is sustained through research grants, public-private partnerships, and generous philanthropic support from organizations and individuals. Learn more and make a gift here.

Read More

Recent Comments

    Archives

    Categories

    • No categories

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Volchenboum Lab
    University of Chicago
    900 East 57th Street
    Chicago, IL 60637

    Accessibility

    • Careers
    • Contact Us
    • Make a Donation