Content Types


AID systems


Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 31 result(s)
Database and knowledgebase of authenticated microbial genomics data with full data provenance to physical materials held within American Type Culture Collection's (ATCC) biorepository and culture collections. Data includes whole genome sequencing data for bacterial, viral and fungal strains at ATCC, their genome assemblies, metadata, drug susceptibility data, and more. All data is freely available for non-commercial research use only (RUO) applications via the web portal interface or via a REST-API. The goal is to provide the research community with provenance information and authentication between the biological source materials and reference genome assemblies derived from them.
Genome track alignments using GBrowse on this site are featured with: (1) Annotated and predicted genes and transcripts; (2) QTL / SNP Association tracks; (3) OMIA genes; (4) Various SNP Chip tracks; (5) Other mapping fetures or elements that are available.
The Department of Energy (DOE) Joint Genome Institute (JGI) is a national user facility with massive-scale DNA sequencing and analysis capabilities dedicated to advancing genomics for bioenergy and environmental applications. Beyond generating tens of trillions of DNA bases annually, the Institute develops and maintains data management systems and specialized analytical capabilities to manage and interpret complex genomic data sets, and to enable an expanding community of users around the world to analyze these data in different contexts over the web. The JGI Genome Portal provides a unified access point to all JGI genomic databases and analytical tools. A user can find all DOE JGI sequencing projects and their status, search for and download assemblies and annotations of sequenced genomes, and interactively explore those genomes and compare them with other sequenced microbes, fungi, plants or metagenomes using specialized systems tailored to each particular class of organisms. Databases: Genome Online Database (GOLD), Integrated Microbial Genomes (IGM), MycoCosm, Phytozome
The Hymenoptera Genome Database is a genome informatics resource that supports the research of insects of the order Hymenoptera (e.g. bees, wasps, ants). HGD provides tools for data mining (HymenopteraMine), sequence searching (BLAST), genome browsing (JBrowse), genome annotation (Apollo) and data download. Available through the navigation bar on the HGD Home page are the archives Ant Genomes Portal, BeeBase, and NasoniaBase which will not be updated.
The U.S. Department of Energy (DOE) Joint Genome Institute (JGI) is a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory (Berkeley Lab). All data generated by the DOE Joint Genome Institute is available through this repository once the data are published or public.
Phytozome is the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute. Families of related genes representing the modern descendants of ancestral genes are constructed at key phylogenetic nodes. These families allow easy access to clade-specific orthology/paralogy relationships as well as insights into clade-specific novelties and expansions.
The Canadian VirusSeq Data Portal (CVDP) is an open-access data portal funded by Genome Canada. It is intended to facilitate access to Canadian SARS-CoV-2 sequences and associated non-sensitive metadata adhering to the FAIR Data principles. Limited contextual metadata and viral genome sequences can be shared among Canadian public health labs, researchers and other groups interested in accessing the data for surveillance, research, and innovation purposes. The CVDP will harmonize, validate, and automate submission to international databases and enable the creation of real-time dashboards that summarize the Canadian data contributions while facilitating exploration and access. Sequences or metadata submitted to the CVDP may not include data that could reveal the personal identity of the source. Its is part of Canadian COVID Genomics Network (CanCOGeN).
The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The Data Coordinating Center (DCC) is the central provider of TCGA data. The DCC standardizes data formats and validates submitted data.
The International Human Epigenome Consortium (IHEC) makes available comprehensive sets of reference epigenomes relevant to health and disease. The IHEC Data Portal can be used to view, search and download the data already released by the different IHEC-associated projects.
Pathogen Portal is a repository linking to the Bioinformatics Resource Centers (BRCs) sponsored by the National Institute of Allergy and Infectious Diseases (NIAID) and maintained by The Virginia Bioinformatics Institute. The BRCs are providing web-based resources to scientific community conducting basic and applied research on organisms considered potential agents of biowarfare or bioterrorism or causing emerging or re-emerging diseases. The Pathogen Portal supports and links to five Bioinformatics Resource Centers (BRCs). Each BRC specializes in a different group of pathogens, focusing on, but not limited to, pathogens causing (Re-)Emerging Infectious Diseases, and those in the NIAID Category A-C Priority Pathogen lists for biodefense research. The scope of the BRCs also includes Invertebrate Vectors of Human Disease. Pathogen Portal covers EuPathDB, IRD, PATRIC, VectorBase and ViPR.
<<<!!!<<< This repository is no longer available. >>>!!!>>> PATRIC will go offline by mid-December2022. Here is what you need to know. As announced previously, PATRIC, the bacterial BRC, and IRD / ViPR, the viral BRCs, are being merged into the new Bacterial and Viral Bioinformatics Resource Center (BV-BRC). BV-BRC combines the data, tools, and technologies from these BRCs to provide an integrated resource for bacterial and viral genomics-based infectious disease research.
>>>>!!!!<<<< The Cancer Genomics Hub mission is now completed. The Cancer Genomics Hub was established in August 2011 to provide a repository to The Cancer Genome Atlas, the childhood cancer initiative Therapeutically Applicable Research to Generate Effective Treatments and the Cancer Genome Characterization Initiative. CGHub rapidly grew to be the largest database of cancer genomes in the world, storing more than 2.5 petabytes of data and serving downloads of nearly 3 petabytes per month. As the central repository for the foundational genome files, CGHub streamlined team science efforts as data became as easy to obtain as downloading from a hard drive. The convenient access to Big Data, and the collaborations that CGHub made possible, are now essential to cancer research. That work continues at the NCI's Genomic Data Commons. All files previously stored at CGHub can be found there. The Website for the Genomic Data Commons is here: >>>>!!!!<<<< The Cancer Genomics Hub (CGHub) is a secure repository for storing, cataloging, and accessing cancer genome sequences, alignments, and mutation information from the Cancer Genome Atlas (TCGA) consortium and related projects. Access to CGHub Data: All researchers using CGHub must meet the access and use criteria established by the National Institutes of Health (NIH) to ensure the privacy, security, and integrity of participant data. CGHub also hosts some publicly available data, in particular data from the Cancer Cell Line Encyclopedia. All metadata is publicly available and the catalog of metadata and associated BAMs can be explored using the CGHub Data Browser.
Project Achilles is a systematic effort aimed at identifying and cataloging genetic vulnerabilities across hundreds of genomically characterized cancer cell lines. The project uses genome-wide genetic perturbation reagents (shRNAs or Cas9/sgRNAs) to silence or knock-out individual genes and identify those genes that affect cell survival. Large-scale functional screening of cancer cell lines provides a complementary approach to those studies that aim to characterize the molecular alterations (e.g. mutations, copy number alterations) of primary tumors, such as The Cancer Genome Atlas (TCGA). The overall goal of the project is to identify cancer genetic dependencies and link them to molecular characteristics in order to prioritize targets for therapeutic development and identify the patient population that might benefit from such targets. Project Achilles data is hosted on the Cancer Dependency Map Portal (DepMap) where it has been harmonized with our genomics and cellular models data. You can access the latest and all past datasets here:
CorrDB has data of cattle, relating to meat production, milk production, growth, health, and others. This database is designed to collect all published livestock genetic/phenotypic trait correlation data, aimed at facilitating genetic network analysis or systems biology studies.
dictyBase is an integrated genetic and literature database that contains published Dictyostelium discoideum literature, genes, expressed sequence tags (ESTs), as well as the chromosomal and mitochondrial genome sequences. Direct access to the genome browser, a Blast search tool, the Dictyostelium Stock Center, research tools, colleague databases, and much much more are just a mouse click away. Dictybase is a genome portal for the Amoebozoa. dictyBase is funded by a grant from the National Institute for General Medical Sciences.
The Chickpea Transcriptome Database (CTDB) has been developed with the view to provide most comprehensive information about the chickpea transcriptome, the most relevant part of the genome. The database contains various information and tools for transcriptome sequence, functional annotation, conserved domain(s), transcription factor families, molecular markers (microsatellites and single nucleotide polymorphisms), Comprehensive gene expression and comparative genomics with other legumes. The database is a freely available resource, which provides user scientists/breeders a portal to search, browse and query the data to facilitate functional and applied genomics research in chickpea and other legumes. The current release of database provides transcriptome sequence from cultivated (Cicer arietinum desi (ICC4958) and kabuli (ICCV2)) and wild (Cicer reticulatum, PI489777) chickpea genotypes.
GnpIS is a multispecies integrative information system dedicated to plant and fungi pests. It bridges genetic and genomic data, allowing researchers access to both genetic information (e.g. genetic maps, quantitative trait loci, association genetics, markers, polymorphisms, germplasms, phenotypes and genotypes) and genomic data (e.g. genomic sequences, physical maps, genome annotation and expression data) for species of agronomical interest. GnpIS is used by both large international projects and plant science departments at the French National Research Institute for Agriculture, Food and Environment. It is regularly improved and released several times per year. GnpIS is accessible through a web portal and allows to browse different types of data either independently through dedicated interfaces or simultaneously using a quick search ('google like search') or advanced search (Biomart, Galaxy, Intermine) tools.
The NCI's Genomic Data Commons (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC obtains validated datasets from NCI programs in which the strategies for tissue collection couples quantity with high quality. Tools are provided to guide data submissions by researchers and institutions.
EnsemblPlants is a genome-centric portal for plant species. Ensembl Plants is developed in coordination with other plant genomics and bioinformatics groups via the EBI's role in the transPLANT consortium.
The DNA Bank Network was established in spring 2007 and was funded until 2011 by the German Research Foundation (DFG). The network was initiated by GBIF Germany (Global Biodiversity Information Facility). It offers a worldwide unique concept. DNA bank databases of all partners are linked and are accessible via a central web portal, providing DNA samples of complementary collections (microorganisms, protists, plants, algae, fungi and animals). The DNA Bank Network was one of the founders of the Global Genome Biodiversity Network (GGBN) and is fully merged with GGBN today. GGBN agreed on using the data model proposed by the DNA Bank Network. The Botanic Garden and Botanical Museum Berlin-Dahlem (BGBM) hosts the technical secretariat of GGBN and its virtual infrastructure. The main focus of the DNA Bank Network is to enhance taxonomic, systematic, genetic, conservation and evolutionary studies by providing: • high quality, long-term storage of DNA material on which molecular studies have been performed, so that results can be verified, extended, and complemented, • complete on-line documentation of each sample, including the provenance of the original material, the place of voucher deposit, information about DNA quality and extraction methodology, digital images of vouchers and links to published molecular data if available.