How can data commons change how the world does research?

The potential for a commons-based approach to change the landscape of cancer research is explored in two of our recent publications. In “Using big data in pediatric oncology: Current applications and future directions” in Seminars in Oncology, we discuss the uses of big data in pediatric cancer, existing pediatric cancer registry initiatives and research, the challenges in harmonizing data to improve accessibility for study, and the future opportunities we see for innovation in this area. In “Data Commons to Support Pediatric Cancer Research” in the American Society of Clinical Oncology Educational Book, we describe current data commons and how they operate in the oncology landscape, and offer a practical paradigm for developing new commons. By centralizing data, processing power, and tools, there is a valuable opportunity to share resources and thus increase the efficiency, power, and impact of research.

You can follow along with our publications on ResearchGate.

A paradigm for building a pediatric cancer data commons

  1. Engage cooperative group(s)
  2. Define scope
  3. Identify funding source
  4. Identify infrastructure
  5. Engage project team
  6. Identify data sources
  7. Establish governance, create policies and procedures
  8. Create contributor / use agreements
  9. Create standards working group to create data dictionary, map elements
  10. Create database
  11. Build front-end query engine
  12. Create and execute communication and education plans
  13. Create sustainability model

Data commons power discovery.

Transforming the Way Researchers Share Data: Lessons from the Pediatric Cancer Data Commons
Presented by Sam Volchenboum, MD, PhD

University of Chicago Department of Pediatrics Grand Rounds, November 2020

FAQ for Researchers

How can researchers get data from the PCDC?

The PCDC is designed to protect the data of research subjects while maximizing the benefit to researchers. Any researcher can freely register to use the PCDC. There, they can access aggregate data numbers, and soon visualizations, within the commons platform. Data sets with line-level data can be requested through the commons portals, and this access is governed by the respective data-contributing consortium. Once a request is submitted, the executive committee for the relevant consortium will approve or deny the request and our PCDC staff will follow up with the researcher accordingly.

Can researchers get data from the PCDC across multiple disease groups (e.g., survival data for a genomic finding found in both liquid and solid tumors)?

Yes, with the approval of the executive committee of each relevant disease group consortium. This approval process will be as streamlined as possible; the governance plan currently being developed and refined will include such project requests coming through the PCDC Executive Committee but requiring approval of the individual disease group executive committees. While this may appear onerous, it is the only way to ensure that each disease group retains their autonomy in deciding how their data are used for research. Thus far, the disease consortia have supported this vision.

How do collaborators join the PCDC Consortium?

Contributing data to the PCDC

Collaborators interested in contributing data for a disease area that is already part of the PCDC may contact our team to discuss key governance and data considerations as well as estimated project timelines. If approved, the process of creating, updating, and executing contracts, data sharing agreements, and Memoranda of Understanding (as applicable) typically ranges from one to six months depending on country, consortium, legal teams, and type(s) of data involved. After key governance processes have been completed, the collaborator works with PCDC developers to discuss data harmonization and integration into the commons and to resolve harmonization and quality control concerns. This data harmonization and integration process, dependent upon several factors, often takes roughly one to six months to successfully complete. Adding data to the commons may further depend on funding to support the transformation of data into a common data model.

Adding new cancers to the PCDC

Over time we plan to extend the PCDC to include additional pediatric cancers. Collaborators interested in creating a data commons for a new cancer should contact the PCDC team to learn more about the key steps, timeline, costs, and sustainability considerations. Time from initial discussions to the establishment of a disease-group consortium and creation of a Memorandum of Understanding (MOU) may range from three months to a year. Next, data use agreements (DUAs) are created, a data dictionary is established, and consortium leaders establish working groups to drive ongoing development and productivity. Once DUAs have been signed and the data dictionary has been established, PCDC developers can begin to map and harmonize the data and perform quality control checks. While the PCDC has been able to streamline the process of commons development, the ultimate pace at which a commons is developed and launched depends heavily on the frequency and intensity of involvement from collaborators, the number and diversity of consortium members involved, and the amount of initial funding available. The process from MOU sign-off to commons launch may take as little as six months to a year with substantial dedication from consortium leaders. Additional commons development, consortium meetings, and data integration continue at regular intervals from this point onward.

Can individual researchers or sites contribute data to the PCDC?

To date, the PCDC has only accepted data from multi-site cooperative groups or, in some cases, nationwide research organizations. Because of the time, resources, and effort involved in integrating each entity into the commons, the PCDC primarily engages with cooperative research groups. If you are an individual researcher or research site interested in joining the PCDC, please contact our team to discuss further.

What quality assurance and harmonization measures are taken when data is added to the commons?

Quality assurance is very important to the PCDC, so each data source is vetted for policies and procedures that help ensure high quality. In addition, the PCDC runs a series of QA/QC scripts to confirm internal consistency of the data. Our team works closely with data managers and stakeholders from all groups to help with the QC processes.

How are PCDC consortia formed?

In some cases, pre-established consortia have approached the PCDC to create a disease-specific commons. In other cases, the PCDC works closely with leaders in a specific pediatric cancer type to identify interested cooperative research groups who are willing to form a consortium to guide commons development and steer future progress. This process may take months of emails, calls, and occasional international face-to-face meetings. Eventually, all committed parties sign a Memorandum of Understanding and establish the necessary contracts, data sharing agreements, and governance mechanisms to guide the consortium moving forward.

Will non-pediatric cancer disease groups be included in the PCDC?

At the moment, the PCDC is focused on pediatric cancers. We recognize, however, that this approach to building commons is applicable across many other disease groups, especially other rare diseases. We encourage researchers working in other cancers or in rare disease specialties to contact us to discuss future opportunities and solutions.