Content Types


AID systems


Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 25 result(s)
A database for plant breeders and researchers to combine, visualize, and interrogate the wealth of phenotype and genotype data generated by the Triticeae Coordinated Agricultural Project (TCAP).
The NCBI database of Genotypes and Phenotypes archives and distributes the results of studies that have investigated the interaction of genotype and phenotype, including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits. The database provides summaries of studies, the contents of measured variables, and original study document text. dbGaP provides two types of access for users, open and controlled. Through the controlled access, users may access individual-level data such as phenotypic data tables and genotypes.
The database contains all the variants published as pathogenic mutations in the international literature up to November 2007. In addition, unpublished Usher mutations and non-pathogenic variants from the laboratory of Montpellier have been included.
The NCBI Short Genetic Variations database, commonly known as dbSNP, catalogs short variations in nucleotide sequences from a wide range of organisms. These variations include single nucleotide variations, short nucleotide insertions and deletions, short tandem repeats and microsatellites. Short Genetic Variations may be common, thus representing true polymorphisms, or they may be rare. Some rare human entries have additional information associated withthem, including disease associations, genotype information and allele origin, as some variations are somatic rather than germline events. ***NCBI will phase out support for non-human organism data in dbSNP and dbVar beginning on September 1, 2017***
>>>!!!<<< 08.08.2019: Plexdb is no longer online, URLold: >>>!!!<<< >>>>!!!! <<<< 13.12.2018: PLEXdb is now a static site after funding stopped from NSF. We have stopped registration of new users; but past users who have data can login when needed and interact with the site. You can download data using the authentication provided at the download page. >>>>!!!!<<<< PLEXdb is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data.
The Japanese Genotype-phenotype Archive (JGA) is a service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects. The JGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers. Strict protocols govern how information is managed, stored and distributed by the JGA. Once processed, all data are encrypted. Users can contact the JGA team from here. JGA services are provided in collaboration with National Bioscience Database Center (NBDC) of Japan Science and Technology Agency.
The Drosophila Genetic Reference Panel (DGRP) is a population consisting of more than 200 inbred lines derived from the Raleigh, USA population. The DGRP is a living library of common polymorphisms affecting complex traits, and a community resource for whole genome association mapping of quantitative trait loci.
DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated to human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships.
Content type(s)
A small genotype data repository containing data used in recent papers from the Estonian Biocentre. Most of the data pertains to human population genetics. PDF files of the papers are also freely available.
FaceBase is a collaborative NIDCR-funded project that houses comprehensive data in support of advancing research into craniofacial development and malformation. It serves as a community resource by curating large datasets of a variety of types from the craniofacial research community and sharing them via this website. Practices emphasize a comprehensive and multidisciplinary approach to understanding the developmental processes that create the face. The data offered spotlights high-throughput genetic, molecular, biological, imaging and computational techniques. One of the missions of this project is to facilitate cooperation and collaboration between the central coordinating center (ie, the Hub) and the craniofacial research community.
ALSoD is a freely available database that has been transformed from a single gene storage facility recording mutations in the SOD1 gene to a multigene ALS bioinformatics repository and analytical instrument combining genotype, phenotype, and geographical information with associated analysis tools. These include a comparison tool to evaluate genes side by side or jointly with user configurable features, a pathogenicity prediction tool using a combination of computational approaches to distinguish variants with nonfunctional characteristics from disease-associated mutations with more dangerous consequences, and a credibility tool to enable ALS researchers to objectively assess the evidence for gene causation in ALS. Furthermore, integration of external tools, systems for feedback, annotation by users, and two-way links to collaborators hosting complementary databases further enhance the functionality of ALSoD.
ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation. ClinVar processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible
The European Genome-phenome Archive (EGA) is designed to be a repository for all types of sequence and genotype experiments, including case-control, population, and family studies. We will include SNP and CNV genotypes from array based methods and genotyping done with re-sequencing methods. The EGA will serve as a permanent archive that will archive several levels of data including the raw data (which could, for example, be re-analysed in the future by other algorithms) as well as the genotype calls provided by the submitters. We are developing data mining and access tools for the database. For controlled access data, the EGA will provide the necessary security required to control access, and maintain patient confidentiality, while providing access to those researchers and clinicians authorised to view the data. In all cases, data access decisions will be made by the appropriate data access-granting organisation (DAO) and not by the EGA. The DAO will normally be the same organisation that approved and monitored the initial study protocol or a designate of this approving organisation. The European Genome-phenome Archive (EGA) allows you to explore datasets from genomic studies, provided by a range of data providers. Access to datasets must be approved by the specified Data Access Committee (DAC).
The Cystic Fibrosis Mutation Database (CFTR1) was initiated by the Cystic Fibrosis Genetic Analysis Consortium in 1989 to increase and facilitate communications among CF researchers, and is maintained by the Cystic Fibrosis Centre at the Hospital for Sick Children in Toronto. The specific aim of the database is to provide up to date information about individual mutations in the CFTR gene. In a major upgrade in 2010, all known CFTR mutations and sequence variants have been converted to the standard nomenclature recommended by the Human Genome Variation Society.
The Drosophila Synthetic Population Resource (DSPR) consists of a new panel of over 1700 recombinant inbred lines (RILs) of Drosophila melanogaster, derived from two highly recombined synthetic populations, each created by intercrossing a different set of 8 inbred founder lines (with one founder line common to both populations). Complete genome sequence data for the founder lines are available, and in addition, there is a high resolution genetic map for each RIL. The DSPR has been developed as a community resource for high-resolution QTL mapping and is intended to be used widely by the Drosophila community.
The Maize Genetics and Genomics Database focuses on collecting data related to the crop plant and model organism Zea mays. The project's goals are to synthesize, display, and provide access to maize genomics and genetics data, prioritizing mutant and phenotype data and tools, structural and genetic map sets, and gene models. MaizeGDB also aims to make the Maize Newsletter available, and provide support services to the community of maize researchers. MaizeGDB is working with the Schnable lab, the Panzea project, The Genome Reference Consortium, and iPlant Collaborative to create a plan for archiving, dessiminating, visualizing, and analyzing diversity data. MMaizeGDB is short for Maize Genetics/Genomics Database. It is a USDA/ARS funded project to integrate the data found in MaizeDB and ZmDB into a single schema, develop an effective interface to access this data, and develop additional tools to make data analysis easier. Our goal in the long term is a true next-generation online maize database.aize genetics and genomics database.
GWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small. We actively gather datasets from public domain projects, and encourage direct data submission from the community.
The Allele Frequency Net Database (AFND) is a public database which contains frequency information of several immune genes such as Human Leukocyte Antigens (HLA), Killer-cell Immunoglobulin-like Receptors (KIR), Major histocompatibility complex class I chain-related (MIC) genes, and a number of cytokine gene polymorphisms. The Allele Frequency Net Database (AFND) provides a central source, freely available to all, for the storage of allele frequencies from different polymorphic areas in the Human Genome. Users can contribute the results of their work into one common database and can perform database searches on information already available. We have currently collected data in allele, haplotype and genotype format. However, the success of this website will depend on you to contribute your data.
ZFIN serves as the zebrafish model organism database. The long term goals for ZFIN are a) to be the community database resource for the laboratory use of zebrafish, b) to develop and support integrated zebrafish genetic, genomic and developmental information, c) to maintain the definitive reference data sets of zebrafish research information, d) to link this information extensively to corresponding data in other model organism and human databases, e) to facilitate the use of zebrafish as a model for human biology and f) to serve the needs of the research community. ZIRC is the Zebrafish International Resource Center, an independent NIH-funded facility providing a wide range of zebrafish lines, probes and health services. ZFIN works closely with ZIRC to connect our genetic data with available probes and fish lines.
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies.
The IMPC is a confederation of international mouse phenotyping projects working towards the agreed goals of the consortium: To undertake the phenotyping of 20,000 mouse mutants over a ten year period, providing the first functional annotation of a mammalian genome. Maintain and expand a world-wide consortium of institutions with capacity and expertise to produce germ line transmission of targeted knockout mutations in embryonic stem cells for 20,000 known and predicted mouse genes. Test each mutant mouse line through a broad based primary phenotyping pipeline in all the major adult organ systems and most areas of major human disease. Through this activity and employing data annotation tools, systematically aim to discover and ascribe biological function to each gene, driving new ideas and underpinning future research into biological systems; Maintain and expand collaborative “networks” with specialist phenotyping consortia or laboratories, providing standardized secondary level phenotyping that enriches the primary dataset, and end-user, project specific tertiary level phenotyping that adds value to the mammalian gene functional annotation and fosters hypothesis driven research; and Provide a centralized data centre and portal for free, unrestricted access to primary and secondary data by the scientific community, promoting sharing of data, genotype-phenotype annotation, standard operating protocols, and the development of open source data analysis tools. Members of the IMPC may include research centers, funding organizations and corporations.
The Expression Atlas provides information on gene expression patterns under different biological conditions such as a gene knock out, a plant treated with a compound, or in a particular organism part or cell. It includes both microarray and RNA-seq data. The data is re-analysed in-house to detect interesting expression patterns under the conditions of the original experiment. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas. The Baseline Atlas displays information about which gene products are present (and at what abundance) in "normal" conditions (e.g. tissue, cell type). It aims to answer questions such as "which genes are specifically expressed in human kidney?". This component of the Expression Atlas consists of highly-curated and quality-checked RNA-seq experiments from ArrayExpress. It has data for many different animal and plant species. New experiments are added as they become available. The Differential Atlas allows users to identify genes that are up- or down-regulated in a wide variety of different experimental conditions such as yeast mutants, cadmium treated plants, cystic fibrosis or the effect on gene expression of mind-body practice. Both microarray and RNA-seq experiments are included in the Differential Atlas. Experiments are selected from ArrayExpress and groups of samples are manually identified for comparison e.g. those with wild type genotype compared to those with a gene knock out. Each experiment is processed through our in-house differential expression statistical analysis pipeline to identify genes with a high probability of differential expression.
As with most biomedical databases, the first step is to identify relevant data from the research community. The Monarch Initiative is focused primarily on phenotype-related resources. We bring in data associated with those phenotypes so that our users can begin to make connections among other biological entities of interest. We import data from a variety of data sources. With many resources integrated into a single database, we can join across the various data sources to produce integrated views. We have started with the big players including ClinVar and OMIM, but are equally interested in boutique databases. You can learn more about the sources of data that populate our system from our data sources page