Filter

Subjects

Content Types

Countries

API

Certificates

Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 46 result(s)
Repository type(s)
  • disciplinary
Provider type(s)
  • serviceProvider
The NCBI Nucleotide database collects sequences from such sources as GenBank, RefSeq, TPA, and PDB. Sequences collected relate to genome, gene, and transcript sequence data, and provide a foundation for research related to the biomedical field.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources. These include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres and routine and comprehensive exchange with our partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
DDBJ; DNA Data Bank of Japan is the sole nucleotide sequence data bank in Asia, which is officially certified to collect nucleotide sequences from researchers and to issue the internationally recognized accession number to data submitters.Since we exchange the collected data with EMBL-Bank/EBI; European Bioinformatics Institute and GenBank/NCBI; National Center for Biotechnology Information on a daily basis, the three data banks share virtually the same data at any given time. The virtually unified database is called "INSD; International Nucleotide Sequence Database DDBJ collects sequence data mainly from Japanese researchers, but of course accepts data and issue the accession number to researchers in any other countries.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
GenBank® is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
<<<!!!<<< Efforts to obtain renewed funding after 2008 were unfortunately not successful. PANDIT has therefore been frozen since November 2008, and its data are not updated since September 2005 when version 17.0 was released (corresponding to Pfam 17.0). The existing data and website remain available from these pages, and should remain stable and, we hope, useful. >>>!!!>>> PANDIT is a collection of multiple sequence alignments and phylogenetic trees. It contains corresponding amino acid and nucleotide sequence alignments, with trees inferred from each alignment. PANDIT is based on the Pfam database (Protein families database of alignments and HMMs), and includes the seed amino acid alignments of most families in the Pfam-A database. DNA sequences for as many members of each family as possible are extracted from the EMBL Nucleotide Sequence Database and aligned according to the amino acid alignment. PANDIT also contains a further copy of the amino acid alignments, restricted to the sequences for which DNA sequences were found.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
The NCBI Short Genetic Variations database, commonly known as dbSNP, catalogs short variations in nucleotide sequences from a wide range of organisms. These variations include single nucleotide variations, short nucleotide insertions and deletions, short tandem repeats and microsatellites. Short Genetic Variations may be common, thus representing true polymorphisms, or they may be rare. Some rare human entries have additional information associated withthem, including disease associations, genotype information and allele origin, as some variations are somatic rather than germline events. ***NCBI will phase out support for non-human organism data in dbSNP and dbVar beginning on September 1, 2017***
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or have been used in an assay database such as the European Nucleotide Archive (ENA) or ArrayExpress.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
Country
The Indian Biological Data Centre (IBDC), located at the Regional Centre for Biotechnology (RCB) in Faridabad, aims to create a centralized repository for biological and biotechnology data in India, addressing the lack of infrastructure for data sharing and management. Established with support from the Department of Biotechnology and in collaboration with the National Institute of Immunology and the International Centre for Biotechnology & Genetic Engineering, the IBDC will store diverse data types, including genomic, proteomic, and imaging data. Its key objectives include developing an IT platform for data storage and retrieval, establishing standard operating procedures for data management according to FAIR principles, implementing analytical software for data analysis, and conducting training programs to promote data sharing among researchers. This initiative is crucial for enhancing data-dependent research and fostering collaboration in India's life sciences landscape.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
Country
Covalent DNA modifications have been found in numerous organisms and more are continually being discovered and characterized, as detection methods improve. Many of these modifications can affect the conformation of the DNA double helix, often resulting in downstream effects upon transcription factor binding. Some of these modifications have been demonstrated to be stable, while others are viewed as merely transient. DNAmod catalogues information on known DNA modifications, of which the well-known 5-methylcytosine is only one. It aims to profile modifications' properties, building upon data contained within the Chemical Entities of Biological Interest (ChEBI) database. It also provides literature citations and includes curated annotations on mapping techniques and natural occurrence information.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
MGnify (formerly: EBI Metagenomics) offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples. Users can submit their own data for analysis or freely browse all of the analysed public datasets held within the repository. In addition, users can request analysis of any appropriate dataset within the European Nucleotide Archive (ENA). User-submitted or ENA-derived datasets can also be assembled on request, prior to analysis.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
<<<!!!<<< GSS sequences are now being merged into the NCBI Nucleotide database >>>!!!>>>
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
NCBI Virus is a community portal for viral sequence data from RefSeq, GenBank and other NCBI repositories. To find, retrieve and analyze data, choose one of the offered options.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
dbEST is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms. Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. They are tags (some coding, others not) of expression for a given cDNA library. Most EST projects develop large numbers of sequences. These are commonly submitted to GenBank and dbEST as batches of dozens to thousands of entries, with a great deal of redundancy in the citation, submitter and library information. To improve the efficiency of the submission process for this type of data, we have designed a special streamlined submission process and data format. dbEST also includes sequences that are longer than the traditional ESTs, or are produced as single sequences or in small batches. Among these sequences are products of differential display experiments and RACE experiments. The thing that these sequences have in common with traditional ESTs, regardless of length, quality, or quantity, is that there is little information that can be annotated in the record. If a sequence is later characterized and annotated with biological features such as a coding region, 5'UTR, or 3'UTR, it should be submitted through the regular GenBank submissions procedure (via BankIt or Sequin), even if part of the sequence is already in dbEST. dbEST is reserved for single-pass reads. Assembled sequences should not be submitted to dbEST. GenBank will accept assembled EST submissions for the forthcoming TSA (Transcriptome Shotgun Assembly) division. The individual reads which make up the assembly should be submitted to dbEST, the Trace archive or the Short Read Archive (SRA) prior to the submission of the assemblies.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
Country
The National Microbial Data Center (NMDC) is jointly constructed by the Institute of Microbiology of the Chinese Academy of Sciences (IMS), the Institute of Oceanography of the Chinese Academy of Sciences, the Institute of Infectious Diseases of the Chinese Center for Disease Control and Prevention, the Institute of Plant Physiology and Ecology of the Chinese Academy of Sciences, and the Computer Network Information Centre of the Chinese Academy of Sciences. The General Office of the Chinese Academy of Sciences is the parent department. The data resources covering the whole life cycle of microbiological research, including microbiological resources, microbiological and cross-technological methods, research processes and engineering, microbiomics, microbiological technologies, as well as microbiological literature, patents, experts and results. The Centre focuses on promoting the convergence and integration of scientific and technological resources in the field of microbiology to the national platform, strengthening the development, application and analysis of microbiological resources, enhancing the effective use of microbiological resources and the ability to support scientific and technological innovation, and providing high-quality scientific and technological resource sharing services for scientific research, technological progress and social development.
Repository type(s)
  • disciplinary
  • institutional
Provider type(s)
  • dataProvider
  • serviceProvider
The Conserved Domain Database is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships
Repository type(s)
  • disciplinary
  • institutional
Provider type(s)
  • dataProvider
TPA is a database that contains sequences built from the existing primary sequence data in GenBank. TPA records are retrieved through the Nucleotide Database and feature information on the sequence, how it was cataloged, and proper way to cite the sequence information.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
PHI-base is a web-accessible database that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, Oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is therfore an invaluable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. In collaboration with the FRAC team, PHI-base also includes antifungal compounds and their target genes.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
A curated database of mutations and polymorphisms associated with Lafora Progressive Myoclonus Epilepsy. The Lafora progressive myoclonus epilepsy mutation and polymorphism database is a collection of hand curated mutation and polymorphism data for the EPM2A and EPM2B (NHLRC1) from publicly available literature: databases and unpublished data. The database is continuously updated with information from in-house experimental data as well as data from published research studies.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
Genomic Expression Archive (GEA) is a public database of functional genomics data such as gene expression, epigenetics and genotyping SNP array. Both microarray- and sequence-based data are accepted in the MAGE-TAB format in compliance with MIAME and MINSEQE guidelines, respectively. GEA issues accession numbers, E-GEAD-n to experiment and A-GEAD-n to array design. Data exchange between GEA and EBI ArrayExpress is planned.
Repository type(s)
  • disciplinary
Provider type(s)
  • serviceProvider
  • dataProvider
Country
The China National GeneBank database (CNGBdb) is a unified platform for biological big data sharing and application services. CNGBdb has now integrated a large amount of internal and external biological data from resources such as CNGB, NCBI, and the EBI. There are several sub-databases in CNGBdb, including literature, variation, gene, genome, protein, sequence, organism, project, sample, experiment, run, and assembly. Based on underlying big data and cloud computing technologies, it provides various data services, including archive, analysis, knowledge search, and management authorization of biological data. CNGBdb adopts data structures and standards of international omics, health, and medicine, such as The International Nucleotide Sequence Database Collaboration (INSDC), The Global Alliance for Genomics and Health GA4GH (GA4GH), Global Genome Biodiversity Network (GGBN), American College of Medical Genetics and Genomics (ACMG), and constructs standardized data and structures with wide compatibility. All public data and services provided by CNGBdb are freely available to all users worldwide. CNGB Sequence Archive (CNSA) is the bionomics data repository of CNGBdb. CNGB Sequence Archive (CNSA) is a convenient and efficient archiving system of multi-omics data in life science, which provides archiving services for raw sequencing reads and further analyzed results. CNSA follows the international data standards for omics data, and supports online and batch submission of multiple data types such as Project, Sample, Experiment/Run, Assembly, Variation, Metabolism, Single cell, and Sequence. Moreover, CNSA has achieved the correlation of sample entities, sample information, and analyzed data on some projects. Its data submission service can be used as a supplement to the literature publishing process to support early data sharing.CNGB Sequence Archive (CNSA) is a convenient and efficient archiving system of multi-omics data in the life science of CNGBdb, which provides archiving services for raw sequencing reads and further analyzed results. CNSA follows the international data standards for omics data, and supports online and batch submission of multiple data types such as Project, Sample, Experiment/Run, Assembly, Variation, Metabolism, Single cell, Sequence. Its data submission service can be used as a supplement to the literature publishing process to support early data sharing.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
This Web resource provides data and information relevant to SARS coronavirus. It includes links to the most recent sequence data and publications, to other SARS related resources, and a pre-computed alignment of genome sequences from various isolates. In order to provide free and easy access to genome and protein sequences and associated metadata from the SARS-CoV-2, we created a dedicated Severe acute respiratory syndrome coronavirus 2 data hub. You can access the Results Table on SARS-CoV-2 data hub, by pressing "RefSeq genomes", "nucleotide" or "protein" links on announcement banner located on NCBI home page, in "Find data" navigation menu or using "Up-to-date SARS-CoV-2" shortcut button in "Search by virus" form. SARS-CoV-2 sequences is part of NCBI Virus https://www.re3data.org/repository/r3d100014322
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
The NCBI Taxonomy database is a curated set of names and classifications for all of the organisms that are represented in GenBank. The EMBL and DDBJ databases, as well as GenBank, now use the NCBI Taxonomy as the standard classification for nucleotide sequences. Taxonomy Contains the names and phylogenetic lineages of more than 160,000 organisms that have molecular data in the NCBI databases. New taxa are added to the Taxonomy database as data are deposited for them. When new sequences are submitted to GenBank, the submission is checked for new organism names, which are then classified and added to the Taxonomy database.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
  • serviceProvider
This site provides access to complete, annotated genomes from bacteria and archaea (present in the European Nucleotide Archive) through the Ensembl graphical user interface (genome browser). Ensembl Bacteria contains genomes from annotated INSDC records that are loaded into Ensembl multi-species databases, using the INSDC annotation import pipeline.
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
<<<!!!<<< the repository is offline >>>!!!>>> GOBASE is a taxonomically broad organelle genome database that organizes and integrates diverse data related to mitochondria and chloroplasts. GOBASE is currently expanding to include information on representative bacteria that are thought to be specifically related to the bacterial ancestors of mitochondria and chloroplasts
Repository type(s)
  • disciplinary
Provider type(s)
  • dataProvider
GenBase is a genetic sequence database that accepts user submissions (mRNA, genomic DNAs, ncRNA, or small genomes such as organelles, viruses, plasmids, phages from any organism) and integrates data from INSDC.