Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 1291 result(s)
CBS offers Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers.
Using serial analysis of gene expression (SAGE) and microarrays, we are examining total mRNA populations in all developmental stages, both in whole worms and in specific cells and tissues. In addition, we are building promoter::GFP constructs to monitor gene expression in transgenic worms, focusing on C. elegans genes that have human orthologues. Also available are web-based PCR primer design tools, and access to information about our C. elegans Fosmid library.
The objective of this Research Coordination Network project is to develop an international network of researchers who use genetic methodologies to study the ecology and evolution of marine organisms in the Indo-Pacific to share data, ideas and methods. DIPnet was created to advance genetic diversity research in the Indo-Pacific by aggregating population genetic metadata into a searchable database (GeOME).
DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.
OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
AmoebaDB belongs to the EuPathDB family of databases and is an integrated genomic and functional genomic database for Entamoeba and Acanthamoeba parasites. In its first iteration (released in early 2010), AmoebaDB contains the genomes of three Entamoeba species (see below). AmoebaDB integrates whole genome sequence and annotation and will rapidly expand to include experimental data and environmental isolate sequences provided by community researchers . The database includes supplemental bioinformatics analyses and a web interface for data-mining.
ToxoDB is a genome database for the genus Toxoplasma, a set of single-celled eukaryotic pathogens that cause human and animal diseases, including toxoplasmosis.
The NCAA Student-Athlete Experiences Data Archive provides access to data about student athletes and will grow to include a handful of user-friendly data collections related to graduation rates; team-level Academic Progress Rates in Division I; and individual-level data on the experiences of current and former student-athletes from the NCAA's Growth, Opportunities, Aspirations and Learning of Students in college study (GOALS), and the Study of College Outcomes and Recent Experiences (SCORE). In the long run, the NCAA expects to follow this initial release with the publication of as much data as possible from its archives. The data is used by college presidents, athletic personnel, faculty, student-athlete groups, media members, and researchers in looking at issues related to intercollegiate athletics and higher education.
heidICON is provided by Heidelberg University Library and is the "Virtual Slide Collection" in progress of organization of Heidelberg University. In addition to record graphic material on current interest for research and teaching, the University departments and institutes can digitize and transfer their already existing slide collections.
The DOE Data Explorer (DDE) is an information tool to help you locate DOE's collections of data and non-text information and, at the same time, retrieve individual datasets within some of those collections. It includes collection citations prepared by the Office of Scientific and Technical Information, as well as citations for individual datasets submitted from DOE Data Centers and other organizations.
ScholarSphere is a secure repository service enabling the Penn State community to share its research and scholarly work with a worldwide audience. Faculty, staff, and students can use ScholarSphere to collect their work in one location and create a durable and citeable record of their papers, presentations, publications, data sets, or other scholarly creations. Through this service, Penn State researchers can also comply with grant-funding-agency requirements for sharing and managing research data.
This site provides information about the NIH MRI Study of Normal Brain Development (Pediatric MRI Study) and resulting Pediatric MRI Data Repository. This website serves as the portal through which data can be obtained by qualified researchers. The overarching goal of the Pediatric MRI Study is to foster a better understanding of normal brain maturation as a basis for understanding atypical brain development associated with a variety of disorders and diseases.
DataBank is a repository that will keep data safe in the long term. It can automatically obtain a Digital Object Indicator (DOI) for each data package, and make the metadata and/or the underlying data searchable and accessible by the wider world.
Open Research Exeter (ORE) is the University of Exeter's repository for all types of research, including research papers, research data and theses. Research in ORE can be viewed and downloaded freely by anyone, anywhere: researchers, students, industry, business and the wider public. ORE's content includes journal articles, conference papers, working papers, reports, book chapters, videos, audio, images, multimedia research project outputs, raw data and analysed data. ORE's content is securely stored, managed and preserved to ensure free, permanent access.
KU ScholarWorks is the digital repository of the University of Kansas. It contains scholarly work created by KU faculty, staff and students, as well as material from the University Archives. KU ScholarWorks makes important research and historical items available to a wider audience and helps assure their long-term preservation.
The Organelle Genome Megasequencing Program (OGMP) provides mitochondrial, chloroplast, and mitochondrial plasmid genome data. OGMP tools allow direct comparison of OGMP and NCBI validated records. Includes GOBASE, a taxonomically broad organelle genome database that organizes and integrates diverse data related to mitochondria and chloroplasts.
The WorldWide Antimalarial Resistance Network (WWARN) is a collaborative platform generating innovative resources and reliable evidence to inform the malaria community on the factors affecting the efficacy of antimalarial medicines. Access to data is provided through diverse Tools and Resources: WWARN Explorer, Molecular Surveyor K13 Methodology, Molecular Surveyor pfmdr1 & pfcrt, Molecular Surveyor dhfr & dhps.
Exposures in the period from conception to early childhood - including fetal growth, cell division, and organ functioning - may have long-lasting impact on health and disease susceptibility. To investigate these issues the Danish National Birth Cohort (Better health in generations) was established. A large cohort of pregnant women with long-term follow-up of the offspring was the obvious choice because many of the exposures of interest cannot be reconstructed with suffcient validity back in time. The study needed to be large, and the aim was to recruit 100,000 women early in pregnancy, and to continue follow-up for decades. Exposure information was collected by computer-assisted telephone interviews with the women twice during pregnancy and when their children were six and 18 months old. Participants were also asked to fill in a self-administered food frequency questionnaire in mid-pregnancy. Furthermore, a biological bank has been set up with blood taken from the mother twice during pregnancy and blood from theumbilical cord taken shortly after birth.
The Cognitive Function and Ageing Studies (CFAS) are population based studies of individuals aged 65 years and over living in the community, including institutions, which is the only large multi-centred population-based study in the UK that has reached sufficient maturity. There are three main studies within the CFAS group. MRC CFAS, the original study began in 1989, with three of its sites providing a parent subset for the comparison two decades later with CFAS II (2008 onwards). Subsequently another CFAS study, CFAS Wales began in 2011.
The Diabetes Study of Northern California (DISTANCE) conducts epidemiological and health services research in diabetes among a large, multiethnic cohort of patients in a large, integrated health care delivery system.
The FDZ-DZA (Forschungsdatenzentrum DZA) is a facility of the German Centre of Gerontology (Deutsches Zentrum für Altersfragen, DZA) and has received accreditation as research data center DZA by the German Data Forum (RatSWD). Its main task is to make data of the German Ageing Survey DEAS and the German Survey on Volunteering (FWS) accessible to researchers by providing user-friendly Scientific Use Files (SUF), documentation of the contents and instruments as well support for scholars using the data.
The Institute of Ocean Sciences (IOS)/Ocean Sciences Division (OSD) data archive contains the holdings of oceanographic data generated by the IOS and other agencies and laboratories, including the Institute of Oceanography at the University of British Columbia and the Pacific Biological Station. The contents include data from B.C. coastal waters and inlets, B.C. continental shelf waters, open ocean North Pacific waters, Beaufort Sea and the Arctic Archipelago.
State of the Salmon provides data on abundance, diversity, and ecosystem health of wild salmon populations specific to the Pacific Ocean, North Western North America, and Asia. Data downloads are available using two geographic frameworks: Salmon Ecoregions or Hydro 1K.
STRING is a database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources: - Genomic Context - High-throughput Experiments - (Conserved) Coexpression - Previous Knowledge STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable.
Species included in PlantTFDB 4.0 covers the main lineages of green plants. Therefore, PlantTFDB provides genomic TF repertoires across Viridiplantae. To provide comprehensive information for the TF family, a brief introduction and key references are presented for each family. Comprehensive annotations are made for each identified TF, including functional domains, 3D structures, gene ontology (GO), plant ontology (PO), expression information, expert-curated functional description, regulation information, interaction, conserved elements, references, and annotations in various databases such as UniProt, RefSeq, TransFac, STRING, and VISTA. By inferring orthologous groups and constructing phylogenetic trees, evolutionary relationships among identified TFs were inferred. In addition, PlantTFDB has a simple and user-friendly interface to allow users to query based on combined conditions or make sequence similarity search using BLAST.