Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 103 result(s)
GeneWeaver combines cross-species data and gene entity integration, scalable hierarchical analysis of user data with a community-built and curated data archive of gene sets and gene networks, and tools for data driven comparison of user-defined biological, behavioral and disease concepts. Gene Weaver allows users to integrate gene sets across species, tissue and experimental platform. It differs from conventional gene set over-representation analysis tools in that it allows users to evaluate intersections among all combinations of a collection of gene sets, including, but not limited to annotations to controlled vocabularies. There are numerous applications of this approach. Sets can be stored, shared and compared privately, among user defined groups of investigators, and across all users.
During cell cycle, numerous proteins temporally and spatially localized in distinct sub-cellular regions including centrosome (spindle pole in budding yeast), kinetochore/centromere, cleavage furrow/midbody (related or homolog structures in plants and budding yeast called as phragmoplast and bud neck, respectively), telomere and spindle spatially and temporally. These sub-cellular regions play important roles in various biological processes. In this work, we have collected all proteins identified to be localized on kinetochore, centrosome, midbody, telomere and spindle from two fungi (S. cerevisiae and S. pombe) and five animals, including C. elegans, D. melanogaster, X. laevis, M. musculus and H. sapiens based on the rationale of "Seeing is believing" (Bloom K et al., 2005). Through ortholog searches, the proteins potentially localized at these sub-cellular regions were detected in 144 eukaryotes. Then the integrated and searchable database MiCroKiTS - Midbody, Centrosome, Kinetochore, Telomere and Spindle has been established.
Content type(s)
The Centre for Applied Genomics hosts a variety of databases related to ongoing supported projects. Curation of these databases is performed in-house by TCAG Bioinformatics staff. The Autism Chromosome Rearrangement Database, The Cystic Fibrosis Mutation Database, TThe Lafora Progressive Myoclonus Epilepsy Mutation and Polymorphism Database are included. Large Scale Genomics Research resources include, the Database of Genomic Variants, The Chromosome 7 Annotation Project, The Human Genome Segmental Duplication Database, and the Non-Human Segmental Duplication Database
The Alzheimer Disease & Frontotemporal Dementia Mutation Database (AD&FTDMDB) aims at collecting all known mutations in the genes related to Alzheimer disease (AD) and fromtotemporal dementias (FTD). Mutations are collected from the literature and from presentations at scientific meetings. In addition, mutations can be submitted to AD&FTDMDB at this web site.
The Plasmid Information Database (PlasmID) was established in 2004 to curate, maintain, and distribute cDNA and ORF constructs for use in basic molecular biological research. The materials deposited at our facility represent the culmination of several international collaborative efforts from 2004 to present: Beth Israel Deaconess Medical Center, Boston Children's Hospital, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Harvard School of Public Health, and Massachusetts General Hospital.
Recode2 is a database of genes that utilize non-standard translation for gene expression purposes. Recoding events described in the database include programmed ribosomal frameshifting, translational bypassing (aka hopping) and mRNA specific codon redefinition. Frameshifting at a particular site often yields two protein products from one coding sequence and sometimes serves a regulatory purpose by acting as a sensor of the level of product protein or of some external ligand. Bypassing (hopping) allows the coupling of two ORFs separated on an mRNA by a coding gap. Codon redefinition occurs when a stop codon is decoded as a standard amino acid (often glutamine or tryptophan), or the 21st amino acid selenocysteine. These recoding events are in competition with standard decoding and are site specific. The efficiency of recoding is often modulated by cis-stimulators and sometimes by trans-factors. The sequences of the genes that use recoding for their expression are in the database. The recoding sites and the known stimulatory signals are annotated in the database together with notes on factors that are known to affect recoding efficiencies.
>>>>!!!!<<<< The Cancer Genomics Hub mission is now completed. The Cancer Genomics Hub was established in August 2011 to provide a repository to The Cancer Genome Atlas, the childhood cancer initiative Therapeutically Applicable Research to Generate Effective Treatments and the Cancer Genome Characterization Initiative. CGHub rapidly grew to be the largest database of cancer genomes in the world, storing more than 2.5 petabytes of data and serving downloads of nearly 3 petabytes per month. As the central repository for the foundational genome files, CGHub streamlined team science efforts as data became as easy to obtain as downloading from a hard drive. The convenient access to Big Data, and the collaborations that CGHub made possible, are now essential to cancer research. That work continues at the NCI's Genomic Data Commons. All files previously stored at CGHub can be found there. The Website for the Genomic Data Commons is here: >>>>!!!!<<<< The Cancer Genomics Hub (CGHub) is a secure repository for storing, cataloging, and accessing cancer genome sequences, alignments, and mutation information from the Cancer Genome Atlas (TCGA) consortium and related projects. Access to CGHub Data: All researchers using CGHub must meet the access and use criteria established by the National Institutes of Health (NIH) to ensure the privacy, security, and integrity of participant data. CGHub also hosts some publicly available data, in particular data from the Cancer Cell Line Encyclopedia. All metadata is publicly available and the catalog of metadata and associated BAMs can be explored using the CGHub Data Browser.
Complete Genomics provides free public access to a variety of whole human genome data sets generated from Complete Genomics’ sequencing service. The research community can explore and familiarize themselves with the quality of these data sets, review the data formats provided from our sequencing service, and augment their own research with additional summaries of genomic variation across a panel of diverse individuals. The quality of these data sets is representative of what a customer can expect to receive for their own samples. This public genome repository comprises genome results from both our Standard Sequencing Service (69 standard, non-diseased samples) and the Cancer Sequencing Service (two matched tumor and normal sample pairs). In March 2013 Complete Genomics was acquired by BGI-Shenzhen , the world’s largest genomics services company. BGI is a company headquartered in Shenzhen, China that provides comprehensive sequencing and bioinformatics services for commercial science, medical, agricultural and environmental applications. Complete Genomics is now focused on building a new generation of high-throughput sequencing technology and developing new and exciting research, clinical and consumer applications.
The Global Proteome Machine (GPM) is a protein identification database. This data repository allows users to post and compare results. GPM's data is provided by contributors like The Informatics Factory, University of Michigan, and Pacific Northwestern National Laboratories. The GPM searchable databases are: GPMDB, pSYT, SNAP, MRM, PEPTIDE and HOT.
Mapping, copy number analysis, sequence and gene expression data generated by the High Resolution Analysis of Follicular Lymphoma Genomes project. The data will be available for 24 patients with follicular lymphoma. All data will be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data.The data from this project will be submitted to public genomic data sources. These sources will be listed on this web site as the data becomes available in these external data sources.
Homomint is a web available tool extending protein-protein interactions experimentally verified in models organisms, to the orthologous proteins in Homo sapiens. Similar to other approaches, the orthology groups in HomoMINT are obtained by the reciprocal best hit method as implemented in the Inparanoid algorithm.
LifeMap Discovery® is a compendium of embryonic development for stem cell research and regenerative medicine, constructed by integrating extensive molecular, cellular, anatomical and medical data curated from scientific literature and high-throughput data sources.
The Human Ageing Genomic Resources (HAGR) is a collection of databases and tools designed to help researchers study the genetics of human ageing using modern approaches such as functional genomics, network analyses, systems biology and evolutionary analyses.
The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs) for the CPTAC program. The portal also hosts analyses of the mass spectrometry data (mapping of spectra to peptide sequences and protein identification) from the PCCs and from a CPTAC-sponsored common data analysis pipeline (CDAP).
The 1000 Genomes Project is an international collaboration to produce an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts. This resource will support genome-wide association studies and other medical research studies. The genomes of about 2500 unidentified people from about 25 populations around the world will be sequenced using next-generation sequencing technologies. The results of the study will be freely and publicly accessible to researchers worldwide.
The DSMZ is one of the largest biological ressource centers worldwide.Its collections currently comprise more than 50,000 items, including about 27,000 different bacterial and 4,000 fungal strains, 800 human and animal cell lines, 700 plant cell lines, 1,400 plant viruses and antisera, and 13,000 different types of bacterial genomic DNA.. All biological materials accepted in the DSMZ collection are subject to extensive quality control and physiological and molecular characterization by our central services. In addition, DSMZ provides an extensive documentation and detailed diagnostic information on the biological materials. The unprecedented diversity and quality management of its bioressources render the DSMZ an internationally reknown supplier for science, diagnostic laboratories, national reference centers, as well as industrial partners.
The Melanoma Molecular Map Project (MMMP) is an open access, interactive web-based multidatabase dedicated to the research on melanoma biology and therapy. The aim of this non-profit project is to create an organized and continuously updated databank collecting the huge and ever growing amount of scientific knowledge on melanoma currently scattered in thousands of articles published in hundreds of Journals.
Stemformatics is a collaboration between the stem cell and bioinformatics community. We were motivated by the plethora of exciting cell models in the public and private domains, and the realisation that for many biologists these were mostly inaccessible. We wanted a fast way to find and visualise interesting genes in these exemplar stem cell datasets. We'd like you to explore. You'll find data from leading stem cell laboratories in a format that is easy to search, easy to visualise and easy to export.
BioGPS is a gene portal built with two guiding principles in mind -- customizability and extensibility. It is a complete resource for learning about gene and protein function. A free extensible and customizable gene annotation portal, a complete resource for learning about gene and protein function.
The Conserved Domain Database is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships