Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 50 result(s)
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
ILC-CNR for CLARIN-IT repository is a library for linguistic data and tools. Including: Text Processing and Computational Philology; Natural Language Processing and Knowledge Extraction; Resources, Standards and Infrastructures; Computational Models of Language Usage. The studies carried out within each area are highly interdisciplinary and involve different professional skills and expertises that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.
Chempound is a new generation repository architecture based on RDF, semantic dictionaries and linked data. It has been developed to hold any type of chemical object expressible in CML and is exemplified by crystallographic experiments and computational chemistry calculations. In both examples, the repository can hold >50k entries which can be searched by SPARQL endpoints and pre-indexing of key fields. The Chempound architecture is general and adaptable to other fields of data-rich science.
!!!! <<<< The Community Data Portal (CDP) has been retired after nearly 15 years of service and is no longer available. Data can now be found here: DASH Search: . Please contact us with questions or concerns: >>>> !!!! The Community Data Portal (CDP) is a collection of earth science datasets from NCAR, UCAR, UOP, and participating organizations.
N U C A S T R O D A T A . O R G is your WWW resource for utilizing nuclear information in studies of astrophysical systems. This site hyperlinks all online nuclear astrophysics datasets, hosts the Computational Infrastructure for Nuclear Astrophysics, and provides a mechnanism for researchers to share files online. We created the first online "cloud computing" system for nuclear astrophysics, a virtual pipeline that enables results from the nuclear laboratory to be rapidly incorporated into astrophysical simulations. This system, the Computational Infrastructure for Nuclear Astrophysics or CINA, came online at
MassIVE is a community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data. MassIVE datasets can be assigned ProteomeXchange accessions to satisfy publication requirements.
Here you will find a collection of atomic microstructures that have been built by the atomic modeling community. Feel free to download any of these and use them in your own scientific explorations.The focus of this cyberinfrastructure is to advance the field of atomic-scale modeling of materials by acting as a forum for disseminating new atomistic scale methodologies, educating non-experts and the next generation of computational materials scientists, and serving as a bridge between the atomistic and complementary (electronic structure, mesoscale) modeling communities.
The repository was esteblished to host, organize, and share materials data. It contains ab initio electronic-structure data from density-functional theory and methods beyond.
This website makes data available from the first round of data sharing projects that were supported by the CRCNS funding program. To enable concerted efforts in understanding the brain experimental data and other resources such as stimuli and analysis tools should be widely shared by researchers all over the world. To serve this purpose, this website provides a marketplace and discussion forum for sharing tools and data in neuroscience. To date we host experimental data sets of high quality that will be valuable for testing computational models of the brain and new analysis methods. The data include physiological recordings from sensory and memory systems, as well as eye movement data. is the premier place for computational nanotechnology research, education, and collaboration. Our site hosts a rapidly growing collection of Simulation Programs for nanoscale phenomena that run in the cloud and are accessible through a web browser. In addition to simulation devices, nanoHUB provides Online Presentations, Courses, Learning Modules, Podcasts, Animations, Teaching Materials, and more. These resources help users learn about our simulation programs and about nanotechnology in general. Our site offers researchers a venue to explore, collaborate, and publish content, as well. Much of these collaborative efforts occur via Workspaces and User groups.
The English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.
The Plant Metabolic Network (PMN) provides a broad network of plant metabolic pathway databases that contain curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism in plants. The PMN currently houses one multi-species reference database called PlantCyc and 22 species/taxon-specific databases.
The NCI National Research Data Collection is Australia’s largest collection of research data, encompassing more than 10 PB of nationally and internationally significant datasets. will provide a comprehensive environment for experimental, theoretical, and computational engineering and science, providing a place not only to steward data from its creation through archive, but also the workspace in which to understand, analyze, collaborate and publish that data. At the heart of the cyberinfrastructure, the Data Depot is the central shared data repository that supports the full research lifecycle, from data creation to analysis to curation and publication. The Data Depot will accept any data the user wishes to supply into a local workspace, even if the data type is unknown or only partial metadata is provided. The Discovery Workspace will be a web-based environment that provides researchers with access to data analysis tools, computational simulation tools, visualization tools, educational tools, and user-contributed tools within the cloud to support research workflows, learning, and discovery. The Reconnaissance Integration Portal will be the main access point to data collected during the reconnaissance of windstorm and earthquake events.
The Basis Set Exchange (BSE) provides a web-based user interface for downloading and uploading Gaussian-type (GTO) basis sets, including effective core potentials (ECPs), from the EMSL Basis Set Library. It provides an improved user interface and capabilities over its predecessor, the EMSL Basis Set Order Form, for exploring the contents of the EMSL Basis Set Library. The popular Basis Set Order Form and underlying Basis Set Library were originally developed by Dr. David Feller and have been available from the EMSL webpages since 1994.
DLESE is the Digital Library for Earth System Education, a geoscience community resource that supports teaching and learning about the Earth system. It is funded by the National Science Foundation and is being built by a community of educators, students, and scientists to support Earth system education at all levels and in both formal and informal settings. Resources in DLESE include lesson plans, scientific data, visualizations, interactive computer models, and virtual field trips - in short, any web-accessible teaching or learning material. Many of these resources are organized in collections, or groups of related resources that reflect a coherent, focused theme. In many ways, digital collections are analogous to collections in traditional bricks-and-mortar libraries.
The RESID Database of Protein Modifications is a comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications.
The CLARIN­-D repository at the University of Leipzig offers long­term preservation of digital resources, along with their descriptive metadata. The mission of the repository is to ensure the availability and long­term preservation of resources, to preserve knowledge gained in research, to aid the transfer of knowledge into new contexts, and to integrate new methods and resources into university curricula. Among the resources currently available in the Leipzig repository are a set of corpora of the Leipzig Corpora Collection (LCC), based on newspaper, Wikipedia and Web text. Furthermore several REST-based webservices are provided for a variety of different NLP-relevant tasks
OpenML is an open ecosystem for machine learning. By organizing all resources and results online, research becomes more efficient, useful and fun. OpenML is a platform to share detailed experimental results with the community at large and organize them for future reuse. Moreover, it will be directly integrated in today’s most popular data mining tools (for now: R, KNIME, RapidMiner and WEKA). Such an easy and free exchange of experiments has tremendous potential to speed up machine learning research, to engender larger, more detailed studies and to offer accurate advice to practitioners. Finally, it will also be a valuable resource for education in machine learning and data mining.