Reset all


Content Types


AID systems


Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 18 result(s)
The National Cancer Data Base (NCDB), a joint program of the Commission on Cancer (CoC) of the American College of Surgeons (ACoS) and the American Cancer Society (ACS), is a nationwide oncology outcomes database for more than 1,500 Commission-accredited cancer programs in the United States and Puerto Rico. Some 70 percent of all newly diagnosed cases of cancer in the United States are captured at the institutional level and reported to the NCDB. The NCDB, begun in 1989, now contains approximately 29 million records from hospital cancer registries across the United States. Data on all types of cancer are tracked and analyzed. These data are used to explore trends in cancer care, to create regional and state benchmarks for participating hospitals, and to serve as the basis for quality improvement.
The Diabetes Study of Northern California (DISTANCE) conducts epidemiological and health services research in diabetes among a large, multiethnic cohort of patients in a large, integrated health care delivery system.
Content type(s)
The Toxics Release Inventory (TRI) is a set of publicly available databases containing information on releases of specific toxic chemicals and their management as waste, as reported annually by U.S. industrial and federal facilities.
The Fragile Families & Child Wellbeing Study is following a cohort of nearly 5,000 children born in large U.S. cities between 1998 and 2000 (roughly three-quarters of whom were born to unmarried parents). We refer to unmarried parents and their children as “fragile families” to underscore that they are families and that they are at greater risk of breaking up and living in poverty than more traditional families. The core Study was originally designed to primarily address four questions of great interest to researchers and policy makers: (1) What are the conditions and capabilities of unmarried parents, especially fathers?; (2) What is the nature of the relationships between unmarried parents?; (3) How do children born into these families fare?; and (4) How do policies and environmental conditions affect families and children?
Project Achilles is a systematic effort aimed at identifying and cataloging genetic vulnerabilities across hundreds of genomically characterized cancer cell lines. The project uses genome-wide genetic perturbation reagents (shRNAs or Cas9/sgRNAs) to silence or knock-out individual genes and identify those genes that affect cell survival. Large-scale functional screening of cancer cell lines provides a complementary approach to those studies that aim to characterize the molecular alterations (e.g. mutations, copy number alterations) of primary tumors, such as The Cancer Genome Atlas (TCGA). The overall goal of the project is to identify cancer genetic dependencies and link them to molecular characteristics in order to prioritize targets for therapeutic development and identify the patient population that might benefit from such targets.
Giardia lamblia is a significant, environmentally transmitted, human pathogen and an amitochondriate protist. It is a major contributor to the enormous worldwide burden of human diarrheal diseases, yet the basic biology of this parasite is not well understood. No virulence factor has been identified. The Giardia lamblia genome contains only 12 million base pairs distributed onto five chromosomes. Its analysis promises to provide insights about the origins of nuclear genome organization, the metabolic pathways used by parasitic protists, and the cellular biology of host interaction and avoidance of host immune systems. Since the divergence of Giardia lamblia lies close to the transition between eukaryotes and prokaryotes in universal ribosomal RNA phylogenies, it is a valuable, if not unique, model for gaining basic insights into genetic innovations that led to formation of eukaryotic cells. In evolutionary terms, the divergence of this organism is at least twice as ancient as the common ancestor for yeast and man. A detailed study of its genome will provide insights into an early evolutionary stage of eukaryotic chromosome organization as well as other aspects of the prokaryotic / eukaryotic divergence.
The Malaria Atlas Project (MAP) brings together researchers based around the world with expertise in a wide range of disciplines from public health to mathematics, geography and epidemiology. We work together to generate new and innovative methods of mapping malaria risk. Ultimately our goal is to produce a comprehensive range of maps and estimates that will support effective planning of malaria control at national and international scales.
Content type(s)
CTD contains manually curated data describing cross-species chemical-gene/protein interactions and chemical- and gene-disease relationships. The results provide insight into the molecular mechanisms underlying variable susceptibility and environmentally influenced diseases. These data will also provide insights into complex chemical-gene and protein interaction networks.
Content type(s)
ITER contains data in support of human health risk assessments. It is compiled by Toxicology Excellence for Risk Assessment (TERA) and contains data from CDC/ATSDR, Health Canada, RIVM, U.S. EPA, IARC, NSF International and independent parties offering peer-reviewed risk values. ITER provides comparison charts of international risk assessment information and explains differences in risk values derived by different organizations.
Content type(s)
A machine learning data repository with interactive visual analytic techniques. This project is the first to combine the notion of a data repository with real-time visual analytics for interactive data mining and exploratory analysis on the web. State-of-the-art statistical techniques are combined with real-time data visualization giving the ability for researchers to seamlessly find, explore, understand, and discover key insights in a large number of public donated data sets. This large comprehensive collection of data is useful for making significant research findings as well as benchmark data sets for a wide variety of applications and domains and includes relational, attributed, heterogeneous, streaming, spatial, and time series data as well as non-relational machine learning data. All data sets are easily downloaded into a standard consistent format. We also have built a multi-level interactive visual analytics engine that allows users to visualize and interactively explore the data in a free-flowing manner.
!! OFFLINE !! A recent computer security audit has revealed security flaws in the legacy HapMap site that require NCBI to take it down immediately. We regret the inconvenience, but we are required to do this. That said, NCBI was planning to decommission this site in the near future anyway (although not quite so suddenly), as the 1,000 genomes (1KG) project has established itself as a research standard for population genetics and genomics. NCBI has observed a decline in usage of the HapMap dataset and website with its available resources over the past five years and it has come to the end of its useful life. The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. All of the information generated by the Project will be released into the public domain. The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs. In the initial phase of the Project, genetic data are being gathered from four populations with African, Asian, and European ancestry. Ongoing interactions with members of these populations are addressing potential ethical issues and providing valuable experience in conducting research with identified populations. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. The Project officially started with a meeting in October 2002 ( and is expected to take about three years.
The PRO-ACT platform houses the largest ALS clinical trials dataset ever created. It is a powerful tool for biomedical researchers, statisticians, clinicians, or anyone else interested in "Big Data." PRO-ACT merges data from existing public and private clinical trials, generating an invaluable resource for the design of future ALS clinical trials. The database will also contribute to the identification of unique observations, novel correlations, and patterns of ALS disease progression, as well as a variety of still unconsidered analyses. More than 600,000 people around them world are battling ALS. The disease strikes indiscriminately, and typically patients will die within 2-5 years following diagnosis. Currently, there are no effective treatments or a cure for ALS. Users of PRO-ACT are helping to accelerate the discovery, development, and delivery of ALS treatments, which will provide hope to patients and their families.
Content type(s)
IRIS contains data in support of human health risk assessment, including hazard identification and dose-response assessments. It is compiled by the U.S. EPA and contains descriptive and quantitative information related to human cancer and non-cancer health effects that may result from exposure to substances in the environment. IRIS data is reviewed by EPA scientists and represents EPA consensus.
The Mexican Health and Aging Study (MHAS) started as a prospective panel study of health and aging in Mexico. MHAS is nationally representative of the 13 million Mexicans born prior to 1951. The survey has national and urban/rural representation. The baseline survey, in 2001, included a nationally representative sample of Mexicans aged 50 and over and their spouse/partners regardless of their age. A direct interview was sought with each individual and proxy interviews were obtained when poor health or temporary absence precluded a direct interview. The sample was distributed in all 32 states of the country in urban and rural areas. Households in the six states which account for 40% of all migrants to the U.S. were over-sampled. A sub-sample was selected to obtain anthropometric measures.
LifeMap Discovery® is a compendium of embryonic development for stem cell research and regenerative medicine, constructed by integrating extensive molecular, cellular, anatomical and medical data curated from scientific literature and high-throughput data sources.
Collection of various motion capture recordings (walking, dancing, sports, and others) performed by over 140 subjects. The database contains free motions which you can download and use. There is a zip file of all asf/amc's on the FAQs page.
The DNA Bank Network was established in spring 2007 and was funded until 2011 by the German Research Foundation (DFG). The network was initiated by GBIF Germany (Global Biodiversity Information Facility). It offers a worldwide unique concept. DNA bank databases of all partners are linked and are accessible via a central web portal, providing DNA samples of complementary collections (microorganisms, protists, plants, algae, fungi and animals). The DNA Bank Network was one of the founders of the Global Genome Biodiversity Network (GGBN) and is fully merged with GGBN today. GGBN agreed on using the data model proposed by the DNA Bank Network. The Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM) hosts the technical secretariat of GGBN and its virtual infrastructure. The main focus of the DNA Bank Network is to enhance taxonomic, systematic, genetic, conservation and evolutionary studies by providing: • high quality, long-term storage of DNA material on which molecular studies have been performed, so that results can be verified, extended, and complemented, • complete on-line documentation of each sample, including the provenance of the original material, the place of voucher deposit, information about DNA quality and extraction methodology, digital images of vouchers and links to published molecular data if available.