Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 179 result(s)
Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.
This is Haldimand County's public platform for exploring and downloading open data, discovering and building apps, and engaging to solve important local issues. You can analyze and combine Open Datasets using maps, as well as develop new web and mobile applications.
An increasing number of Language Resources (LT) in the various fields of Human Language Technology (HLT) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community. Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.
<<<!!!<<< The repository is offline >>>!!!>>> A collection of open content name datasets for Information Centric Networking. The "Content Name Collection" (CNC) lists and hosts open datasets of content names. These datasets are either derived from URL link databases or web traces. The names are typically used for research on Information Centric Networking (ICN), for example to measure cache hit/miss ratios in simulations.
Brainlife promotes engagement and education in reproducible neuroscience. We do this by providing an online platform where users can publish code (Apps), Data, and make it "alive" by integragrate various HPC and cloud computing resources to run those Apps. Brainlife also provide mechanisms to publish all research assets associated with a scientific project (data and analyses) embedded in a cloud computing environment and referenced by a single digital-object-identifier (DOI). The platform is unique because of its focus on supporting scientific reproducibility beyond open code and open data, by providing fundamental smart mechanisms for what we refer to as “Open Services.”
ROSA P is the United States Department of Transportation (US DOT) National Transportation Library's (NTL) Repository and Open Science Access Portal (ROSA P). The name ROSA P was chosen to honor the role public transportation played in the civil rights movement, along with one of the important figures, Rosa Parks. To meet the requirements outlined in its legislative mandate, NTL collects research and resources across all modes of transportation and related disciplines, with specific focus on research, data, statistics, and information produced by USDOT, state DOTs, and other transportation organizations. Content types found in ROSA P include textual works, datasets, still image works, moving image works, other multimedia, and maps. These resources have value to federal, state, and local transportation decision makers, transportation analysts, and researchers.
Academic Torrents is a distributed data repository. The academic torrents network is built for researchers, by researchers. Its distributed peer-to-peer library system automatically replicates your datasets on many servers, so you don't have to worry about managing your own servers or file availability. Everyone who has data becomes a mirror for those data so the system is fault-tolerant.
The UWA Profiles and Research Repository contains research publications, research datasets, theses, equipment, grants and activities created by researchers and postgraduates affiliated with the University of Western Australia (UWA). It is managed by the University Library and provides access to research datasets held at UWA. The information about each dataset has been provided by UWA research groups. Dataset metadata is harvested into Research Data Australia (RDA)
Lab Notes Online presents historic scientific data from the Caltech Archives' collections in digital facsimile. Beginning in the fall of 2008, the first publication in the series is Robert A. Millikan's notebooks for his oil drop experiments to measure the charge of the electron, dating from October 1911 to April 1912. Other laboratory, field, or research notes will be added to the archive over time.
FLOSSmole is a collaborative collection of free, libre, and open source software (FLOSS) data. FLOSSmole contains nearly 1 TB of data covering the period 2004 until now, about more than 500,000 different open source projects.
-----<<<<< The repository is no longer available. This record is out-dated. The Matter lab provides the archived database version of 2012 and 2013 at Data linked from the World Community Grid - The Clean Energy Project see at and on fighshare >>>>>----- The Clean Energy Project Database (CEPDB) is a massive reference database for organic semiconductors with a particular emphasis on photovoltaic applications. It was created to store and provide access to data from computational as well as experimental studies, on both known and virtual compounds. It is a free and open resource designed to support researchers in the field of organic electronics in their scientific pursuits. The CEPDB was established as part of the Harvard Clean Energy Project (CEP), a virtual high-throughput screening initiative to identify promising new candidates for the next generation of carbon-based solar cell materials.
Launchpad is a software collaboration platform that provides: Bug tracking, Code hosting using Bazaar, Code reviews Ubuntu package building and hosting, Translations, Mailing lists, Answer tracking and FAQs, Specification tracking. Launchpad can host your project’s source code using the Bazaar version control system
The Google Code Archive contains the data found on the Google Code Project Hosting Service, which turned down in early 2016. This archive contains over 1.4 million projects, 1.5 million downloads, and 12.6 million issues. Google Project Hosting powers Project Hosting on Google Code and Eclipse Labs. Project Hosting on Google Code Eclipse Labs. It provides a fast, reliable, and easy open source hosting service with the following features: Instant project creation on any topic; Git, Mercurial and Subversion code hosting with 2 gigabyte of storage space and download hosting support with 2 gigabytes of storage space; Integrated source code browsing and code review tools to make it easy to view code, review contributions, and maintain a high quality code base; An issue tracker and project wiki that are simple, yet flexible and powerful, and can adapt to any development process; Starring and update streams that make it easy to keep track of projects and developers that you care about.
Content type(s)
A machine learning data repository with interactive visual analytic techniques. This project is the first to combine the notion of a data repository with real-time visual analytics for interactive data mining and exploratory analysis on the web. State-of-the-art statistical techniques are combined with real-time data visualization giving the ability for researchers to seamlessly find, explore, understand, and discover key insights in a large number of public donated data sets. This large comprehensive collection of data is useful for making significant research findings as well as benchmark data sets for a wide variety of applications and domains and includes relational, attributed, heterogeneous, streaming, spatial, and time series data as well as non-relational machine learning data. All data sets are easily downloaded into a standard consistent format. We also have built a multi-level interactive visual analytics engine that allows users to visualize and interactively explore the data in a free-flowing manner.
Cell phones have become an important platform for the understanding of social dynamics and influence, because of their pervasiveness, sensing capabilities, and computational power. Many applications have emerged in recent years in mobile health, mobile banking, location based services, media democracy, and social movements. With these new capabilities, we can potentially be able to identify exact points and times of infection for diseases, determine who most influences us to gain weight or become healthier, know exactly how information flows among employees and productivity emerges in our work spaces, and understand how rumors spread. In an attempt to address these challenges, we release several mobile data sets here in "Reality Commons" that contain the dynamics of several communities of about 100 people each. We invite researchers to propose and submit their own applications of the data to demonstrate the scientific and business values of these data sets, suggest how to meaningfully extend these experiments to larger populations, and develop the math that fits agent-based models or systems dynamics models to larger populations. These data sets were collected with tools developed in the MIT Human Dynamics Lab and are now available as open source projects or at cost.
NKN is now Research Computing and Data Services (RCDS)! We provide data management support for UI researchers and their regional, national, and international collaborators. This support keeps researchers at the cutting-edge of science and increases our institution's competitiveness for external research grants. Quality data and metadata developed in research projects and curated by RCDS (formerly NKN) is a valuable, long-term asset upon which to develop and build new research and science.
CLARIN-LV is a national node of Clarin ERIC (Common Language Resources and Technology Infrastructure). The mission of the repository is to ensure the availability and long­ term preservation of language resources. The data stored in the repository are being actively used and cited in scientific publications.
GnpIS is a multispecies integrative information system dedicated to plant and fungi pests. It bridges genetic and genomic data, allowing researchers access to both genetic information (e.g. genetic maps, quantitative trait loci, association genetics, markers, polymorphisms, germplasms, phenotypes and genotypes) and genomic data (e.g. genomic sequences, physical maps, genome annotation and expression data) for species of agronomical interest. GnpIS is used by both large international projects and plant science departments at the French National Research Institute for Agriculture, Food and Environment. It is regularly improved and released several times per year. GnpIS is accessible through a web portal and allows to browse different types of data either independently through dedicated interfaces or simultaneously using a quick search ('google like search') or advanced search (Biomart, Galaxy, Intermine) tools.
The Centre for the Environment, Fisheries and Aquaculture Science (Cefas), as one of the world's longest-established marine research organisations, has provided advice on the sustainable exploitation of marine resources since 1902. Today Cefas works in support of a healthy environment and a growing blue economy providing innovative solutions for the aquatic environment, biodiversity and food security. The Cefas Data Hub provides access to over 2080 metadata records, with over 5500 data sets available to download and connect to in support of commitments to Open Science through the Data Portal. Datasets available are increasingly diverse and include many legacy datasets including those from fish, shellfish and plankton surveys from the 1980's to the present day. Other increasingly international datasets made available include species migration data from tagging activities and data on habitat and sediment, ecosystem change, human activities including marine litter, otolith sampling and fish stomach contents, oceanography, acoustics, health and water quality. Data is provided under Open Government License by default where feasible.