Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 90 result(s)
The NCBI database of Genotypes and Phenotypes archives and distributes the results of studies that have investigated the interaction of genotype and phenotype, including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits. The database provides summaries of studies, the contents of measured variables, and original study document text. dbGaP provides two types of access for users, open and controlled. Through the controlled access, users may access individual-level data such as phenotypic data tables and genotypes.
IntEnz contains the recommendation of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzyme-catalyzed reactions. Users can browse by enzyme classification or use advanced search options to search enzymes by class, subclass and sub-subclass information.
AceView provides a curated, comprehensive and non-redundant sequence representation of all public mRNA sequences (mRNAs from GenBank or RefSeq, and single pass cDNA sequences from dbEST and Trace). These experimental cDNA sequences are first co-aligned on the genome then clustered into a minimal number of alternative transcript variants and grouped into genes. Using exhaustively and with high quality standards the available cDNA sequences evidences the beauty and complexity of mammals’ transcriptome, and the relative simplicity of the nematode and plant transcriptomes. Genes are classified according to their inferred coding potential; many presumably non-coding genes are discovered. Genes are named by Entrez Gene names when available, else by AceView gene names, stable from release to release. Alternative features (promoters, introns and exons, polyadenylation signals) and coding potential, including motifs, domains, and homologies are annotated in depth; tissues where expression has been observed are listed in order of representation; diseases, phenotypes, pathways, functions, localization or interactions are annotated by mining selected sources, in particular PubMed, GAD and Entrez Gene, and also by performing manual annotation, especially in the worm. In this way, both the anatomy and physiology of the experimentally cDNA supported human, mouse and nematode genes are thoroughly annotated.
The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. Please, check the reference page to find articles describing the DIP database in greater detail. The Database of Ligand-Receptor Partners (DLRP) is a subset of DIP (Database of Interacting Proteins). The DLRP is a database of protein ligand and protein receptor pairs that are known to interact with each other. By interact we mean that the ligand and receptor are members of a ligand-receptor complex and, unless otherwise noted, transduce a signal. In some instances the ligand and/or receptor may form a heterocomplex with other ligands/receptors in order to be functional. We have entered the majority of interactions in DLRP as full DIP entries, with links to references and additional information
The dbMHC database provides an open, publicly accessible platform for DNA and clinical data related to the human Major Histocompatibility Complex (MHC). The dbMHC provides access to human leukocyte antigen (HLA) sequences, HLA allele and haplotype frequencies, and clinical datasets.
The PeptideAtlas validates expressed proteins to provide eukaryotic genome data. Peptide Atlas provides data to advance biological discoveries in humans. The PeptideAtlas accepts proteomic data from high-throughput processes and encourages data submission.
The MG-RAST server is an open source system for annotation and comparative analysis of metagenomes. Users can upload raw sequence data in fasta format; the sequences will be normalized and processed and summaries automatically generated. The server provides several methods to access the different data types, including phylogenetic and metabolic reconstructions, and the ability to compare the metabolism and annotations of one or more metagenomes and genomes. In addition, the server offers a comprehensive search capability. Access to the data is password protected, and all data generated by the automated pipeline is available for download in a variety of common formats. MG-RAST has become an unofficial repository for metagenomic data, providing a means to make your data public so that it is available for download and viewing of the analysis without registration, as well as a static link that you can use in publications. It also requires that you include experimental metadata about your sample when it is made public to increase the usefulness to the community.
Greengenes is an Earth Sciences website that assists clinical and environmental microbiologists from around the globe in classifying microorganisms from their local environments. A 16S rRNA gene database addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies.
The Japanese Genotype-phenotype Archive (JGA) is a service for permanent archiving and sharing of all types of individual-level genetic and de-identified phenotypic data resulting from biomedical research projects. The JGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers. Strict protocols govern how information is managed, stored and distributed by the JGA. Once processed, all data are encrypted. Users can contact the JGA team from here. JGA services are provided in collaboration with National Bioscience Database Center (NBDC) of Japan Science and Technology Agency.
VectorBase provides data on arthropod vectors of human pathogens. Sequence data, gene expression data, images, population data, and insecticide resistance data for arthropod vectors are available for download. VectorBase also offers genome browser, gene expression and microarray repository, and BLAST searches for all VectorBase genomes. VectorBase Genomes include Aedes aegypti, Anopheles gambiae, Culex quinquefasciatus, Ixodes scapularis, Pediculus humanus, Rhodnius prolixus. VectorBase is one the Bioinformatics Resource Centers (BRC) projects which is funded by National Institute of Allergy and Infectious Diseases (NAID).
The Wellcome Trust Sanger Institute is a charitably funded genomic research centre located in Hinxton, nine miles south of Cambridge in the UK. We study diseases that have an impact on health globally by investigating genomes. Building on our past achievements and based on priorities that exploit the unique expertise of our Faculty of researchers, we will lead global efforts to understand the biology of genomes. We are convinced of the importance of making this research available and accessible for all audiences. reduce global health burdens.
The repository is no longer available <<<!!!<<< TOXNET has moved. Most content will continue to be collected and reviewed; selected information is accessible through PubChem, PubMed, and Bookshelf. If you have questions, please contact NLM Customer Support at >>>!!!>>>
The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, other animals, and humans. Understanding the shape of a molecule helps to understand how it works. This knowledge can be used to help deduce a structure's role in human health and disease, and in drug development. The structures in the archive range from tiny proteins and bits of DNA to complex molecular machines like the ribosome.
GENCODE is a scientific project in genome research and part of the ENCODE (ENCyclopedia Of DNA Elements) scale-up project. The GENCODE consortium was initially formed as part of the pilot phase of the ENCODE project to identify and map all protein-coding genes within the ENCODE regions (approx. 1% of Human genome). Given the initial success of the project, GENCODE now aims to build an “Encyclopedia of genes and genes variants” by identifying all gene features in the human and mouse genome using a combination of computational analysis, manual annotation, and experimental validation, and annotating all evidence-based gene features in the entire human genome at a high accuracy.
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules that serves a global community of researchers, educators, and students. The data contained in the archive include atomic coordinates, crystallographic structure factors and NMR experimental data. Aside from coordinates, each deposition also includes the names of molecules, primary and secondary structure information, sequence database references, where appropriate, and ligand and biological assembly information, details about data collection and structure solution, and bibliographic citations. The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as deposition, data processing and distribution centers for PDB data. Members are: RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and BMRB (USA). The wwPDB's mission is to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community.
The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources. These include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres and routine and comprehensive exchange with our partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature.
Cryo electron microscopy enables the determination of 3D structures of macromolecular complexes and cells from 2 to 100 Å resolution. EMDataResource is the unified global portal for one-stop deposition and retrieval of 3DEM density maps, atomic models and associated metadata, and is a joint effort among investigators of the Stanford/SLAC CryoEM Facility and the Research Collaboratory for Structural Bioinformatics (RCSB) at Rutgers, in collaboration with the EMDB team at the European Bioinformatics Institute. EMDataResource also serves as a resource for news, events, software tools, data standards, and validation methods for the 3DEM community. The major goal of the EMDataResource project in the current funding period is to work with the 3DEM community to (1) establish data-validation methods that can be used in the process of structure determination, (2) define the key indicators of a well-determined structure that should accompany every deposition, and (3) implement appropriate validation procedures for maps and map-derived models into a 3DEM validation pipeline.
The dbVar is a database of genomic structural variation containing data from multiple gene studies. Users can browse data containing the number of variant cells from each study, and filter studies by organism, study type, method and genomic variant. Organisms include human, mouse, cattle and several additional animals. ***NCBI will phase out support for non-human organism data in dbSNP and dbVar beginning on September 1, 2017 ***
The IPD-IMGT/HLA Database provides a specialist database for sequences of the human major histocompatibility complex (MHC) and includes the official sequences named by the WHO Nomenclature Committee For Factors of the HLA System. The IPD-IMGT/HLA Database is part of the international ImMunoGeneTics project (IMGT). The database uses the 2010 naming convention for HLA alleles in all tools herein. To aid in the adoption of the new nomenclature, all search tools can be used with both the current and pre-2010 allele designations. The pre-2010 nomenclature designations are only used where older reports or outputs have been made available for download.
The Taenia solium genome project is a whole genome sequencing project of the parasite Taenia solium, the causal agent of human and porcine cysticercosis; a disease that is still a public health problem of relevance in Mexico. It is being carried out by a consortium of scientists belonging to diverse institutions of the Universidad Nacional Autónoma de México (UNAM, the National Autonomous University of Mexico).
Complete Genomics provides free public access to a variety of whole human genome data sets generated from Complete Genomics’ sequencing service. The research community can explore and familiarize themselves with the quality of these data sets, review the data formats provided from our sequencing service, and augment their own research with additional summaries of genomic variation across a panel of diverse individuals. The quality of these data sets is representative of what a customer can expect to receive for their own samples. This public genome repository comprises genome results from both our Standard Sequencing Service (69 standard, non-diseased samples) and the Cancer Sequencing Service (two matched tumor and normal sample pairs). In March 2013 Complete Genomics was acquired by BGI-Shenzhen , the world’s largest genomics services company. BGI is a company headquartered in Shenzhen, China that provides comprehensive sequencing and bioinformatics services for commercial science, medical, agricultural and environmental applications. Complete Genomics is now focused on building a new generation of high-throughput sequencing technology and developing new and exciting research, clinical and consumer applications.
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies.
The NCI’s Cancer Genome Anatomy Project (CGAP) is an online resource designed to provide the scientific community with detailed characterization of gene expression in biological tissues. By characterizing normal, pre-cancer and cancer cells, CGAP aims to improve detection, diagnosis and treatment for the patient. Moreover, CGAP provides access to cDNA clones to the research community through a variety of distributors. CGAP provides a wide range of genomic data and resources
The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with comprehensive information o­n the approximately 700 prokaryote species that are present in the human oral cavity. Approximately 49% are officially named, 17% unnamed (but cultivated) and 34% are known o­nly as uncultivated phylotypes. The HOMD presents a provisional naming scheme for the currently unnamed species so that strain, clone, and probe data from any laboratory can be directly linked to a stably named reference scheme. The HOMD links sequence data with phenotypic, phylogenetic, clinical, and bibliographic information. Genome sequences for oral bacteria determined as part of this project, the Human Microbiome Project, and other sequencing projects are being added to the HOMD as they become available. Genomes for 315 oral taxa (46% of taxa o­n HOMD) are currently available o­n HOMD. The HOMD site offers easy to use tools for viewing all publically available oral bacterial genomes.