Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 46 result(s)
The Entrez Protein Clusters database contains annotation information, publications, structures and analysis tools for related protein sequences encoded by complete genomes. The data available in the Protein Clusters Database is generated from prokaryotic genomic studies and is intended to assist researchers studying micro-organism evolution as well as other biological sciences. Available genomes include plants and viruses as well as organelles and microbial genomes.
DEG hosts records of currently available essential genomic elements, such as protein-coding genes and non-coding RNAs, among bacteria, archaea and eukaryotes. Essential genes in a bacterium constitute a minimal genome, forming a set of functional modules, which play key roles in the emerging field, synthetic biology.
CBS offers Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers.
Project Tycho® is a project at the University of Pittsburgh to advance the availability and use of public health data for science and policy making. Currently, the Project Tycho® database includes data from all weekly notifiable disease reports for the United States dating back to 1888. These data are freely available to anybody interested. Additional U.S. and international data will be released twice yearly.
Content type(s)
Marine Microbial Database of India is an initiative of CSIR National Institute of Oceanography (NIO). It is supported by Council of Scientific and Industrial Research (CSIR) and managed by Biodiversity Informatics Group (BIG), Bioinformatics Centre of the NIO. It contains records about 1,814 marine microbes. Each record provides information on microbe’s location, habitat, importance (of the organism), threats (to the organism). The database also provides a Taxonomic Hierarchy and Scientific Name Index.
The Bremen Core Repository - BCR, for International Ocean Discovery Program (IODP), Integrated Ocean Discovery Program (IODP), Ocean Drilling Program (ODP), and Deep Sea Drilling Project (DSDP) cores from the Atlantic Ocean, Mediterranean and Black Seas and Arctic Ocean is operated at University of Bremen within the framework of the German participation in IODP. It is one of three IODP repositories (beside Gulf Coast Repository (GCR) in College Station, TX, and Kochi Core Center (KCC), Japan). One of the scientific goals of IODP is to research the deep biosphere and the subseafloor ocean. IODP has deep-frozen microbiological samples from the subseafloor available for interested researchers and will continue to collect and preserve geomicrobiology samples for future research.
BiGG is a knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different organisms. BiGG can be used to browse model content, visualize metabolic pathway maps, and export SBML files of the models for further analysis by external software packages. Users may follow links from BiGG to several external databases to obtain additional information on genes, proteins, reactions, metabolites and citations of interest.
The Cancer Immunome Database (TCIA) provides results of comprehensive immunogenomic analyses of next generation sequencing data (NGS) data for 19 solid cancers from The Cancer Genome Atlas (TCGA) and other datasource. The Cancer Immunome Atlas (TCIA) was developed and is maintained at the Division of Bioinformatics (ICBI). The database can be queried for the gene expression of specific immune-related gene sets, cellular composition of immune infiltrates (characterized using gene set enrichment analyses and deconvolution), neoantigens and cancer-germline antigens, HLA types, and tumor heterogeneity (estimated from cancer cell fractions). Moreover it provides survival analyses for different types immunological parameters. TCIA will be constantly updated with new data and results.
This Web resource provides data and information relevant to SARS coronavirus. It includes links to the most recent sequence data and publications, to other SARS related resources, and a pre-computed alignment of genome sequences from various isolates. The genome of SARS-CoV consists of a single, positive-strand RNA that is approximately 29,700 nucleotides long. The overall genome organization of SARS-CoV is similar to that of other coronaviruses. The reference genome includes 13 genes, which encode at least 14 proteins. Two large overlapping reading frames (ORFs) encompass 71% of the genome. The remainder has 12 potential ORFs, including genes for structural proteins S (spike), E (small envelope), M (membrane), and N (nucleocapsid). Other potential ORFs code for unique putative SARS-CoV-specific polypeptides that lack obvious sequence similarity to known proteins.
GENCODE is a scientific project in genome research and part of the ENCODE (ENCyclopedia Of DNA Elements) scale-up project. The GENCODE consortium was initially formed as part of the pilot phase of the ENCODE project to identify and map all protein-coding genes within the ENCODE regions (approx. 1% of Human genome). Given the initial success of the project, GENCODE now aims to build an “Encyclopedia of genes and genes variants” by identifying all gene features in the human and mouse genome using a combination of computational analysis, manual annotation, and experimental validation, and annotating all evidence-based gene features in the entire human genome at a high accuracy.
We developed a method, ChIP-sequencing (ChIP-seq), combining chromatin immunoprecipitation (ChIP) and massively parallel sequencing to identify mammalian DNA sequences bound by transcription factors in vivo. We used ChIP-seq to map STAT1 targets in interferon-gamma (IFN-gamma)-stimulated and unstimulated human HeLa S3 cells, and compared the method's performance to ChIP-PCR and to ChIP-chip for four chromosomes.For both Chromatin- immunoprecipation Transcription Factors and Histone modifications. Sequence files and the associated probability files are also provided.
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations. Sequence inputs include low-coverage genome sequence data and ENCODE data. A Nucleic Acids Research article describes the system architecture
Greengenes is an Earth Sciences website that assists clinical and environmental microbiologists from around the globe in classifying microorganisms from their local environments. A 16S rRNA gene database addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies.
The goals of FMGP are to: (i) sequence complete mitochondrial genomes from all major fungal lineages, (ii) infer a robust fungal phylogeny, (iii) define the origin of the fungi, their protistan ancestors, and their specific phylogenetic link to the animals, (iv) investigate mitochondrial gene expression, introns, RNAse P RNA structures, mobile elements.
Oral Cancer Gene Database is an initiative of the Advanced Centre for Treatment, Research and Education in Cancer, Navi Mumbai. The present database, version II, consists of 374 genes. It is developed as a user friendly site that would provide the scientist, information and external links from one place. The database is accessed through a list of all genes, and Keyword Search using gene name or gene symbol, chromosomal location, CGH (in %), and molecular weight. Interaction Network shows the interaction between genes for particular biological processes and molecular functions.
This site is dedicated to making high value health data more accessible to entrepreneurs, researchers, and policy makers in the hopes of better health outcomes for all. In a recent article, Todd Park, United States Chief Technology Officer, captured the essence of what the Health Data Initiative is all about and why our efforts here are so important.
The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education.
The Taenia solium genome project is a whole genome sequencing project of the parasite Taenia solium, the causal agent of human and porcine cysticercosis; a disease that is still a public health problem of relevance in Mexico. It is being carried out by a consortium of scientists belonging to diverse institutions of the Universidad Nacional Autónoma de México (UNAM, the National Autonomous University of Mexico).
Introduction of genome-scale metabolic network: The completion of genome sequencing and subsequent functional annotation for a great number of species enables the reconstruction of genome-scale metabolic networks. These networks, together with in silico network analysis methods such as the constraint based methods (CBM) and graph theory methods, can provide us systems level understanding of cellular metabolism. Further more, they can be applied to many predictions of real biological application such as: gene essentiality analysis, drug target discovery and metabolic engineering
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies.
The Africa Centre offers longitudinal datasets from a rural demographic in KwaZulu-Natal, South Africa where HIV prevalence is extremely high. The data may be filtered by demographics, years, or by individuals questionnaires. The Africa Centre requests notification that anyone contact them when downloading their data. Since January 2000, the Africa Centre For Population Health has built up an extensive longitudinal database of demographic, social, medical and economic information about the members of its Demographic Surveillance Area, which is situated in a rural area of northern KwaZulu-Natal. It has developed from this database, the following suite of datasets which can be used both internally within the organisation, and by other researchers.
The Swiss HIV Cohort Study (SHCS), established in 1988, is a systematic longitudinal study enrolling HIV-infected individuals in Switzerland. It is a collaboration of all Swiss University Hospital infectious disease outpatient clinics, two large cantonal hospitals, all with affiliated laboratories, and with affiliated smaller hospitals and private physicians carrying for HIV patients. The Swiss Mother and Child HIV Cohort Study (MoCHiV) is integrated into the SHCS. It aims at preventing mother to child transmission and enrolls HIV-infected pregnant women and their children. The SHCS involves practically all researchers being active in patient-oriented HIV research in Switzerland. The clinics can delegate recruitment of participants and follow-up visits to other outpatient clinics or to specialized private physicians, provided that the requirements of the protocol can be entirely fulfilled and controlled. The laboratories can contract other laboratories for some of the analyses.