Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 109 result(s)
The UC San Diego Library Digital Collections website gathers two categories of content managed by the Library: library collections (including digitized versions of selected collections covering topics such as art, film, music, history and anthropology) and research data collections (including research data generated by UC San Diego researchers).
Neotoma is a multiproxy paleoecological database that covers the Pliocene-Quaternary, including modern microfossil samples. The database is an international collaborative effort among individuals from 19 institutions, representing multiple constituent databases. There are over 20 data-types within the Neotoma Paleoecological Database, including pollen microfossils, plant macrofossils, vertebrate fauna, diatoms, charcoal, biomarkers, ostracodes, physical sedimentology and water chemistry. Neotoma provides an underlying cyberinfrastructure that enables the development of common software tools for data ingest, discovery, display, analysis, and distribution, while giving domain scientists control over critical taxonomic and other data quality issues.
The Deep Blue Data repository is a means for University of Michigan researchers to make their research data openly accessible to anyone in the world, provided they meet collections criteria. Submitted data sets undergo a curation review by librarians to support discovery, understanding, and reuse of the data.
Research Data Australia is the data discovery service of the Australian National Data Service (ANDS). We do not store the data itself here but provide descriptions of, and links to, the data from our data publishing partners. ANDS is funded by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).
TES is the first satellite instrument to provide simultaneous concentrations of carbon monoxide, ozone, water vapor and methane throughout Earth’s lower atmosphere. This lower atmosphere (the troposphere) is situated between the surface and the height at which aircraft fly, and is an important part of the atmosphere that we often impact with our activities.
The Comprehensive Epidemiologic Data Resource (CEDR) is the Department of Energy's (DOE) electronic database comprised of health studies of DOE contract workers and environmental studies of areas surrounding DOE facilities. DOE recognizes the benefits of data sharing and supports the public's right to know about worker and community health risks. CEDR provides independent researchers and the public with access to de-identified data collected since the Department's early production years. Current CEDR holdings include more than 80 studies of over 1 million workers at 31 DOE sites. Access to these data is at no cost to the user. Most of CEDR's holdings are derived from epidemiologic studies of DOE workers at many large nuclear weapons plants, such as Hanford, Los Alamos, the Oak Ridge reservation, Savannah River Site, and Rocky Flats. These studies primarily use death certificate information to identify excess deaths and patterns of disease among workers to determine what factors contribute to the risk of developing cancer and other illnesses. In addition, many of these studies have radiation exposure measurements on individual workers. CEDR is supported by the Oak Ridge Institute for Science and Education (ORISE) in Oak Ridge, Tennessee. Now a mature system in routine operational use, CEDR's modern internet-based systems respond to thousands of requests to its web server daily. With about 1,500 Internet sites pointing to CEDR's web site, CEDR is a national user facility, with a large audience for data that are not available elsewhere.
The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI. The database accepts submissions via an online tool, or in a simple tab-delimited format. It also enables authors to submit supplementary information and link to it from the publication.
The Norwegian Marine Data Centre (NMD) at the Institute of Marine Research was established as a national data centre dedicated to the professional processing and long-term storage of marine environmental and fisheries data and production of data products. The Institute of Marine Research continuously collects large amounts of data from all Norwegian seas. Data are collected using vessels, observation buoys, manual measurements, gliders – amongst others. NMD maintains the largest collection of marine environmental and fisheries data in Norway.
SeaBASS, the publicly shared archive of in situ oceanographic and atmospheric data maintained by the NASA Ocean Biology Processing Group (OBPG). High quality in situ measurements are prerequisite for satellite data product validation, algorithm development, and many climate-related inquiries. As such, the NASA Ocean Biology Processing Group (OBPG) maintains a local repository of in situ oceanographic and atmospheric data to support their regular scientific analyses. The SeaWiFS Project originally developed this system, SeaBASS, to catalog radiometric and phytoplankton pigment data used their calibration and validation activities. To facilitate the assembly of a global data set, SeaBASS was expanded with oceanographic and atmospheric data collected by participants in the SIMBIOS Program, under NASA Research Announcements NRA-96 and NRA-99, which has aided considerably in minimizing spatial bias and maximizing data acquisition rates. Archived data include measurements of apparent and inherent optical properties, phytoplankton pigment concentrations, and other related oceanographic and atmospheric data, such as water temperature, salinity, stimulated fluorescence, and aerosol optical thickness. Data are collected using a number of different instrument packages, such as profilers, buoys, and hand-held instruments, and manufacturers on a variety of platforms, including ships and moorings.
VAMDC aims to be an interoperable e-infrastructure that provides the international research community with access to a broad range of atomic and molecular (A&M) data compiled within a set of A&M databases accessible through the provision of this portal and of user software. Furthermore VAMDC aims to provide A&M data providers and compilers with a large dissemination platform for their work. VAMDC infrastructure was established to provide a service to a wide international research community and has been developed in conjunction with consultations and advice from the A&M user community.
OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
AmoebaDB belongs to the EuPathDB family of databases and is an integrated genomic and functional genomic database for Entamoeba and Acanthamoeba parasites. In its first iteration (released in early 2010), AmoebaDB contains the genomes of three Entamoeba species (see below). AmoebaDB integrates whole genome sequence and annotation and will rapidly expand to include experimental data and environmental isolate sequences provided by community researchers . The database includes supplemental bioinformatics analyses and a web interface for data-mining.
ToxoDB is a genome database for the genus Toxoplasma, a set of single-celled eukaryotic pathogens that cause human and animal diseases, including toxoplasmosis.
GloPAD is a multimedia, multilingual, web-accessible database containing digital images, texts, video clips, sound recordings, and complex media objects (such as 3-D images) related to the performing arts from around the world. GloPAD (Global Performing Arts Database) records include authoritative, detailed, multilingual descriptions of digital images, texts, video clips, sound recordings, and complex media objects related to the performing arts around the world, plus information about related pieces, productions, performers, and creators. GloPAC is an international organization of institutions and individuals committed to using innovative digital technologies to create easily accessible, multimedia, and multilingual information resources for the study and preservation of the performing arts.
The ADS is an accredited digital repository for heritage data that supports research, learning and teaching with freely available, high quality and dependable digital resources by preserving and disseminating digital data in the long term. The ADS also promotes good practice in the use of digital data, provides technical advice to the heritage community, and supports the deployment of digital technologies.
STRING is a database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources: - Genomic Context - High-throughput Experiments - (Conserved) Coexpression - Previous Knowledge STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable.
FungiDB belongs to the EuPathDB family of databases and is an integrated genomic and functional genomic database for the kingdom Fungi. FungiDB was first released in early 2011 as a collaborative project between EuPathDB and the group of Jason Stajich (University of California, Riverside). At the end of 2015, FungiDB was integrated into the EuPathDB bioinformatic resource center. FungiDB integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes comparative genomics, analysis of gene expression, and supplemental bioinformatics analyses and a web interface for data-mining.
WFCC Global Catalogue of Microorganisms (GCM) is expected to be a robust, reliable and user-friendly system to help culture collections to manage, disseminate and share the information related to their holdings. It also provides a uniform interface for the scientific and industrial communities to access the comprehensive microbial resource information.
BExIS is the online data repository and information system of the Biodiversity Exploratories Project (BE). The BE is a German network of biodiversity related working groups from areas such as vegetation and soil science, zoology and forestry. Up to three years after data acquisition, the data use is restricted to members of the BE. Thereafter, the data is usually public available (
The VDC is a public, web-based search engine for accessing worldwide earthquake strong ground motion data. While the primary focus of the VDC is on data of engineering interest, it is also an interactive resource for scientific research and government and emergency response professionals.
The Database of Protein Disorder (DisProt) is a curated database that provides information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part. DisProt is a community resource annotating protein sequences for intrinsically disorder regions from the literature. It classifies intrinsic disorder based on experimental methods and three ontologies for molecular function, transition and binding partner.
The arctic data archive system (ADS) collects observation data and modeling products obtained by various Japanese research projects and gives researchers to access the results. By centrally managing a wide variety of Arctic observation data, we promote the use of data across multiple disciplines. Researchers use these integrated databases to clarify the mechanisms of environmental change in the atmosphere, ocean, land-surface and cryosphere. That ADS will be provide an opportunity of collaboration between modelers and field scientists, can be expected.
The Cystic Fibrosis Mutation Database (CFTR1) was initiated by the Cystic Fibrosis Genetic Analysis Consortium in 1989 to increase and facilitate communications among CF researchers, and is maintained by the Cystic Fibrosis Centre at the Hospital for Sick Children in Toronto. The specific aim of the database is to provide up to date information about individual mutations in the CFTR gene. In a major upgrade in 2010, all known CFTR mutations and sequence variants have been converted to the standard nomenclature recommended by the Human Genome Variation Society.
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. It is used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times.