Reset all


Content Types


AID systems


Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 19 result(s)
The project brings together national key players providing environmentally related biological data and services to develop the ‘German Federation for Biological Data' (GFBio). The overall goal is to provide a sustainable, service oriented, national data infrastructure facilitating data sharing and stimulating data intensive science in the fields of biological and environmental research.
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data. The UniProt Knowledgebase,is an expertly and richly curated protein database, consisting of two sections called UniProtKB/Swiss-Prot and UniProtKB/TrEMBL.
TopFIND is a protein-centric database for the annotation of protein termini currently in its third version. Non-canonical protein termini can be the result of multiple different biological processes, including pre-translational processes such as alternative splicing and alternative translation initiation or post-translational protein processing by proteases that cleave proteases as part of protein maturation or as a regulatory modification. Accordingly, protein termini evidence in TopFIND is inferred from other databases such as ENSEMBL transcripts, TISdb for alternative translation initiation, MEROPS for protein cleavage by proteases, and UniProt for canonical and protein isoform start sites.
DEIMS-SDR (Dynamic Ecological Information Management System - Site and dataset registry) is an information management system that allows you to discover long-term ecosystem research sites around the globe, along with the data gathered at those sites and the people and networks associated with them. DEIMS-SDR describes a wide range of sites, providing a wealth of information, including each site’s location, ecosystems, facilities, parameters measured and research themes. It is also possible to access a growing number of datasets and data products associated with the sites. All sites and dataset records can be referenced using unique identifiers that are generated by DEIMS-SDR. It is possible to search for sites via keyword, predefined filters or a map search. By including accurate, up to date information in DEIMS, site managers benefit from greater visibility for their LTER site, LTSER platform and datasets, which can help attract funding to support site investments. The aim of DEIMS-SDR is to be the globally most comprehensive catalogue of environmental research and monitoring facilities, featuring foremost but not exclusively information about all LTER sites on the globe and providing that information to science, politics and the public in general.
OpenWorm aims to build the first comprehensive computational model of the Caenorhabditis elegans (C. elegans), a microscopic roundworm. With only a thousand cells, it solves basic problems such as feeding, mate-finding and predator avoidance. Despite being extremely well studied in biology, this organism still eludes a deep, principled understanding of its biology. We are using a bottom-up approach, aimed at observing the worm behaviour emerge from a simulation of data derived from scientific experiments carried out over the past decade. To do so we are incorporating the data available in the scientific community into software models. We are engineering Geppetto and Sibernetic, open-source simulation platforms, to be able to run these different models in concert. We are also forging new collaborations with universities and research institutes to collect data that fill in the gaps All the code we produce in the OpenWorm project is Open Source and available on GitHub.
The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures. There are four major levels in the CATH hierarchy; Class, Architecture, Topology and Homologous superfamily.
This site provides access to complete, annotated genomes from bacteria and archaea (present in the European Nucleotide Archive) through the Ensembl graphical user interface (genome browser). Ensembl Bacteria contains genomes from annotated INSDC records that are loaded into Ensembl multi-species databases, using the INSDC annotation import pipeline.
The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs) for the CPTAC program. The portal also hosts analyses of the mass spectrometry data (mapping of spectra to peptide sequences and protein identification) from the PCCs and from a CPTAC-sponsored common data analysis pipeline (CDAP).
IMGT/GENE-DB is the IMGT genome database for IG and TR genes from human, mouse and other vertebrates. IMGT/GENE-DB provides a full characterization of the genes and of their alleles: IMGT gene name and definition, chromosomal localization, number of alleles, and for each allele, the IMGT allele functionality, and the IMGT reference sequences and other sequences from the literature. IMGT/GENE-DB allele reference sequences are available in FASTA format (nucleotide and amino acid sequences with IMGT gaps according to the IMGT unique numbering, or without gaps).
The Progenetix database provides an overview of copy number abnormalities in human cancer from currently 32548 array and chromosomal Comparative Genomic Hybridization (CGH) experiments, as well as Whole Genome or Whole Exome Sequencing (WGS, WES) studies. The cancer profile data in Progenetix was curated from 1031 articles and represents 366 different cancer types, according to the International classification of Diseases in Oncology (ICD-O).
>>>!!!<<< caArray Retirement Announcement >>>!!!<<< The National Cancer Institute (NCI) Center for Biomedical Informatics and Information Technology (CBIIT) instance of the caArray database was retired on March 31st, 2015. All publicly-accessible caArray data and annotations will be archived and will remain available via FTP download and is also available at GEO . >>>!!!<<< While NCI will not be able to provide technical support for the caArray software after the retirement, the source code is available on GitHub , and we encourage continued community development. Molecular Analysis of Brain Neoplasia (Rembrandt fine-00037) gene expression data has been loaded into ArrayExpress: >>>!!!<<< caArray is an open-source, web and programmatically accessible microarray data management system that supports the annotation of microarray data using MAGE-TAB and web-based forms. Data and annotations may be kept private to the owner, shared with user-defined collaboration groups, or made public. The NCI instance of caArray hosts many cancer-related public datasets available for download.
This is CSDB version 1 merged from Bacterial (BCSDB) and Plant&Fungal (PFCSDB) databases. This database aims at provision of structural, bibliographic, taxonomic, NMR spectroscopic and other information on glycan and glycoconjugate structures of prokaryotic, plant and fungal origin. It has been merged from the Bacterial and Plant&Fungal Carbohydrate Structure Databases (BCSDB+PFCSDB). The key points of this service are: High coverage. The coverage for bacteria (up to 2016) and archaea (up to 2016) is above 80%. Similar coverage for plants and fungi is expected in the future. The database is close to complete up to 1998 for plants, and up to 2006 for fungi. Data quality. High data quality is achieved by manual curation using original publications which is assisted by multiple automatic procedures for error control. Errors present in publications are reported and corrected, when possible. Data from other databases are verified on import. Detailed annotations. Structural data are supplied with extended bibliography, assigned NMR spectra, taxon identification including strains and serogroups, and other information if available in the original publication. Services. CSDB serves as a platform for a number of computational services tuned for glycobiology, such as NMR simulation, automated structure elucidation, taxon clustering, 3D molecular modeling, statistical processing of data etc. Integration. CSDB is cross-linked to other glycoinformatics projects and NCBI databases. The data are exportable in various formats, including most widespread encoding schemes and records using GlycoRDF ontology. Free web access. Users can access the database for free via its web interface (see Help). The main source of data is retrospective literature analysis. About 20% of data were imported from CCSD (Carbbank, University of Georgia, Athens; structures published before 1996) with subsequent manual curation and approval. The current coverage is displayed in red on the top of the left menu. The time lag between the publication of new data and their deposition into CSDB is ca. 1 year. In the scope of bacterial carbohydrates, CSDB covers nearly all structures of this origin published up to 2016. Prokaryotic, plant and fungal means that a glycan was found in the organism(s) belonging to these taxonomic domains or was obtained by modification of those found in them. Carbohydrate means a structure composed of any residues linked by glycosidic, ester, amidic, ketal, phospho- or sulpho-diester bonds in which at least one residue is a sugar or its derivative.
LOVD portal provides LOVD software and access to a list of worldwide LOVD applications through Locus Specific Database list and List of Public LOVD installations. The LOVD installations that have indicated to be included in the global LOVD listing are included in the overall LOVD querying service, which is based on an API.
The Ensembl project produces genome databases for vertebrates and other eukaryotic species. Ensembl is a joint project between the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI) to develop a software system that produces and maintains automatic annotation on selected genomes.The Ensembl project was started in 1999, some years before the draft human genome was completed. Even at that early stage it was clear that manual annotation of 3 billion base pairs of sequence would not be able to offer researchers timely access to the latest data. The goal of Ensembl was therefore to automatically annotate the genome, integrate this annotation with other available biological data and make all this publicly available via the web. Since the website's launch in July 2000, many more genomes have been added to Ensembl and the range of available data has also expanded to include comparative genomics, variation and regulatory data. Ensembl is a joint project between European Bioinformatics Institute (EBI), an outstation of the European Molecular Biology Laboratory (EMBL), and the Wellcome Trust Sanger Institute (WTSI). Both institutes are located on the Wellcome Trust Genome Campus in Hinxton, south of the city of Cambridge, United Kingdom.
InnateDB is a publicly available database of the genes, proteins, experimentally-verified interactions and signaling pathways involved in the innate immune response of humans, mice and bovines to microbial infection. The database captures an improved coverage of the innate immunity interactome by integrating known interactions and pathways from major public databases together with manually-curated data into a centralised resource. The database can be mined as a knowledgebase or used with our integrated bioinformatics and visualization tools for the systems level analysis of the innate immune response.