Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 29 result(s)
Project Data Sphere, LLC, operates a free digital library-laboratory where the research community can broadly share, integrate and analyze historical, de-identified, patient-level data from academic and industry cancer Phase II-III clinical trials. These patient-level datasets are available through the Project Data Sphere platform to researchers affiliated with life science companies, hospitals and institutions, as well as independent researchers, at no cost and without requiring a research proposal.
The National Cancer Data Base (NCDB), a joint program of the Commission on Cancer (CoC) of the American College of Surgeons (ACoS) and the American Cancer Society (ACS), is a nationwide oncology outcomes database for more than 1,500 Commission-accredited cancer programs in the United States and Puerto Rico. Some 70 percent of all newly diagnosed cases of cancer in the United States are captured at the institutional level and reported to the NCDB. The NCDB, begun in 1989, now contains approximately 29 million records from hospital cancer registries across the United States. Data on all types of cancer are tracked and analyzed. These data are used to explore trends in cancer care, to create regional and state benchmarks for participating hospitals, and to serve as the basis for quality improvement.
The Progenetix database provides an overview of copy number abnormalities in human cancer from currently 32548 array and chromosomal Comparative Genomic Hybridization (CGH) experiments, as well as Whole Genome or Whole Exome Sequencing (WGS, WES) studies. The cancer profile data in Progenetix was curated from 1031 articles and represents 366 different cancer types, according to the International classification of Diseases in Oncology (ICD-O).
The Cancer Immunome Database (TCIA) provides results of comprehensive immunogenomic analyses of next generation sequencing data (NGS) data for 20 solid cancers from The Cancer Genome Atlas (TCGA) and other datasource. The Cancer Immunome Atlas (TCIA) was developed and is maintained at the Division of Bioinformatics (ICBI). The database can be queried for the gene expression of specific immune-related gene sets, cellular composition of immune infiltrates (characterized using gene set enrichment analyses and deconvolution), neoantigens and cancer-germline antigens, HLA types, and tumor heterogeneity (estimated from cancer cell fractions). Moreover it provides survival analyses for different types immunological parameters. TCIA will be constantly updated with new data and results.
Content type(s)
CaPSURE™ is a longitudinal, observational study of approximately 15,000 men with all stages of biopsy-proven prostate cancer. Patients have enrolled at 43 community urology practices, academic medical centers, and VA hospitals throughout the United States since 1995. CEASAR stands for Comparative Effectiveness Analysis of Surgery and Radiation. The ongoing goal of CEASAR is to help learn more about what prostate cancer treatments work best, for which patients, in whose hands. There are currently about 3,600 men with a prostate cancer diagnosis participating in CEASAR. Three rounds of surveys have been completed, with the first carried out in the spring of 2010. We are currently in the process of conducting our fourth survey with the same group of men in our study. This survey, our Three Year Follow-up, will occur throughout the summer of 2014.
4DGenome is a public database that archives and disseminates chromatin interaction data. Currently, 4DGenome contains over 8,038,248 interactions curated from both experimental studies (high throughput and individual studies) and computational predictions. It covers five organisms, Homo sapiens, Mus musculus, Drosophila melanogaster, Plasmodium falciparum, and Saccharomyces cerevisiae.
arrayMap is a repository of cancer genome profiling data. Original) from primary repositories (e.g. NCBI GEO, EBI ArrayExpress, TCGA) is re-processed and annotated for metadata. Unique visualisation of the processed data allows critical evaluation of data quality and genome information. Structured metadata provides easy access to summary statistics, with a focus on copy number aberrations in cancer entities.
BioGrid Australia Limited operates a federated data sharing platform for collaborative translational health and medical research providing a secure infrastructure that advances health research by linking privacy-protected and ethically approved data among a wide network of health collaborators. BioGrid links real-time de-identified health data across institutions, jurisdictions and diseases to assist researchers and clinicians improve their research and clinical outcomes. The web-based infrastructure provides ethical access while protecting both privacy and intellectual property.
The Australian Breast Cancer Tissue Bank (ABCTB) provides data contributed by an Australian network of cancer clinicians, researchers, and patients. ABCTB privacy protection policy ensures patients' identities are not revealed and cancer researchers are the only individuals with open access to data.
The CancerData site is an effort of the Medical Informatics and Knowledge Engineering team (MIKE for short) of Maastro Clinic, Maastricht, The Netherlands. Our activities in the field of medical image analysis and data modelling are visible in a number of projects we are running. CancerData is offering several datasets. They are grouped in collections and can be public or private. You can search for public datasets in the NBIA (National Biomedical Imaging Archive) image archives without logging in.
Established by the HLA Informatics Group of the Anthony Nolan Research Institute, IPD provides a centralized system for studying the immune system's polymorphism in genes. The IPD maintains databases concerning the sequences of human Killer-cell Immunoglobulin-like Receptors (KIR), sequences of the major histocompatibility complex in a number of species, human platelet antigens (HPA), and tumor cell lines. Each subject has related, credible news, current research and publications, and a searchable database for highly specific, research grade genetic information.
Patients-derived tumor xenograft (PDX) mouse models are an important oncology research platform to study tumor evolution, drug response and personalised medicine approaches.
The dbMHC database provides an open, publicly accessible platform for DNA and clinical data related to the human Major Histocompatibility Complex (MHC). The dbMHC provides access to human leukocyte antigen (HLA) sequences, HLA allele and haplotype frequencies, and clinical datasets.
The Erythron Database is a resource dedicated to facilitating better understanding of the cellular and molecular underpinnings of mammalian erythropoiesis. The resource is built upon a searchable database of gene expression in murine primitive and definitive erythroid cells at progressive stages of maturation.
XNAT CENTRAL is a publicly accessible datasharing portal at Washinton University Medical School using XNAT software. XNAT provides neuroimaging data through a web interface and a customizable open source platform. XNAT facilitates data uploads and downloads for data sharing, processing and organization.
The Cancer in Young People in Canada (CYP-C) surveillance program collects in-depth data concerning risk factors, health outcomes, quality and accessibility of care, and late effects among children and youth with cancer. CYP-C represents a collaboration involving the C17 Council, Canadian Partnerships Against Cancer (CPAC), Public Health Agency of Canada (PHAC), provincial and territorial cancer registries, Statistics Canada and non-governmental organizations.
The population-based cancer registries in each German federal state transfer data to the German Centre for Cancer Registry Data, as required by the Federal Cancer Registry Data Act. These data are combined, quality-checked, analysed and evaluated, and the results published in collaboration with the public health institutions of the federal states.
A premier source for United States cancer statistics, SEER gathers information related to incidence, prevalence, and survival from specific geographic areas that represent 28 percent of the population, as well as compiles related reports and reports on the national cancer mortality rates. Their aim is to provide information related to cancer statistics and decrease the burden of cancer in the national population. SEER has been collecting data from cancer cases since 1973.
With its “Blood Donor BIOBANK”, the Bavarian Red Cross (BRK) Blood Donor Service offers a unique and innovative resource for biomarker research: the world’s first blood donor based biobank. Biobanks as collections of biological material together with associated medical data open new possibilities for the development of new targeted diagnostics and therapies. The BRK Blood Donor Service maintains a unique collection of over 3 million blood samples, making it one of the largest sample collections worldwide. Every working day 2,000 new samples are added to the collection.
TCIA is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “collections”; typically patients’ imaging related by a common disease (e.g. lung cancer), image modality or type (MRI, CT, digital histopathology, etc) or research focus. Supporting data related to the images such as patient outcomes, treatment details, genomics and expert analyses are also provided when available.
The Centre for Clinical Trials Cologne (Köln ZKS) aims to support all processes of clinical trials and the quality of patient-oriented clinical research in an academic environment. It supports doctors of University Hospital of Cologne, other clinics, study groups and professional associations in the design and conduct of clinical trials. For the pharmaceutical industry and contract research organizations, the ZKS Köln is a clinic near partner for medical research projects.
The NCI's Genomic Data Commons (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC obtains validated datasets from NCI programs in which the strategies for tissue collection couples quantity with high quality. Tools are provided to guide data submissions by researchers and institutions.
>>>!!!<<< caArray Retirement Announcement >>>!!!<<< The National Cancer Institute (NCI) Center for Biomedical Informatics and Information Technology (CBIIT) instance of the caArray database was retired on March 31st, 2015. All publicly-accessible caArray data and annotations will be archived and will remain available via FTP download and is also available at GEO . >>>!!!<<< While NCI will not be able to provide technical support for the caArray software after the retirement, the source code is available on GitHub , and we encourage continued community development. Molecular Analysis of Brain Neoplasia (Rembrandt fine-00037) gene expression data has been loaded into ArrayExpress: >>>!!!<<< caArray is an open-source, web and programmatically accessible microarray data management system that supports the annotation of microarray data using MAGE-TAB and web-based forms. Data and annotations may be kept private to the owner, shared with user-defined collaboration groups, or made public. The NCI instance of caArray hosts many cancer-related public datasets available for download.
The SICAS Medical Image Repository is a freely accessible repository containing medical research data including medical images, surface models, clinical data, genomics data and statistical shape models. The data can freely be organized and shared on SMIR and made publicly accessible with a DOI. Dedicated data sets are organized as collections of anatomical regions (e.g Cochlea). The data can be filtered using a modular search and accessed on the web or through the SMIR API.
The sequencing of several bird genomes and the anticipated sequencing of many more provided the impetus to develop a model organism database devoted to the taxonomic class: Aves. Birds provide model organisms important to the study of neurobiology, immunology, genetics, development, oncology, virology, cardiovascular biology, evolution and a variety of other life sciences. Many bird species are also important to agriculture, providing an enormous worldwide food source worldwide. Genomic approaches are proving invaluable to studying traits that affect meat yield, disease resistance, behavior, and bone development along with many other factors affecting productivity. In this context, BirdBase will serve both biomedical and agricultural researchers.