Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 41 result(s)
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. It is used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times.
OpenML is an open ecosystem for machine learning. By organizing all resources and results online, research becomes more efficient, useful and fun. OpenML is a platform to share detailed experimental results with the community at large and organize them for future reuse. Moreover, it will be directly integrated in today’s most popular data mining tools (for now: R, KNIME, RapidMiner and WEKA). Such an easy and free exchange of experiments has tremendous potential to speed up machine learning research, to engender larger, more detailed studies and to offer accurate advice to practitioners. Finally, it will also be a valuable resource for education in machine learning and data mining.
BioGPS is a gene portal built with two guiding principles in mind -- customizability and extensibility. It is a complete resource for learning about gene and protein function. A free extensible and customizable gene annotation portal, a complete resource for learning about gene and protein function.
Content type(s)
A machine learning data repository with interactive visual analytic techniques. This project is the first to combine the notion of a data repository with real-time visual analytics for interactive data mining and exploratory analysis on the web. State-of-the-art statistical techniques are combined with real-time data visualization giving the ability for researchers to seamlessly find, explore, understand, and discover key insights in a large number of public donated data sets. This large comprehensive collection of data is useful for making significant research findings as well as benchmark data sets for a wide variety of applications and domains and includes relational, attributed, heterogeneous, streaming, spatial, and time series data as well as non-relational machine learning data. All data sets are easily downloaded into a standard consistent format. We also have built a multi-level interactive visual analytics engine that allows users to visualize and interactively explore the data in a free-flowing manner.
VADS is the online resource for visual arts. It has provided services to the academic community for 12 years and has built up a considerable portfolio of visual art collections comprising over 100,000 images that are freely available and copyright cleared for use in learning, teaching and research in the UK. VADS provides: expert guidance and help for digital projects in art education; resource development and hosting for art education; project management and consultancy for art education; leadership in the innovative use of ICT in education through its research and development activities. VADS offers advice and guidance to the visual arts research, teaching and learning communities on all aspects of digital resource management from funding, through delivery and use, to preservation.
Mulce (MUltimodal contextualized Learner Corpus Exchange) is a research project supported by the National Research Agency (ANR programme: "Corpus and Tools in the Humanities", ANR-06-CORP-006). A teaching corpus (LETEC - Learning and Teaching Corpora) combines a systematic and structured data set, particularly of interactional data, and traces left by a training course experimentation, conducted partially or completely online and completed by additional technical, human, pedagogical and scientific information to enable the data to be analysed in context.
TurtleSAT is a new website where communities are mapping the location of freshwater turtles in waterways and wetlands across the country. Australia's unique freshwater turtles are in crisis - their numbers are declining and your help is needed to record where you see turtles in your local area.
Social Computing Data Repository hosts data from a collection of many different social media sites, most of which have blogging capacity. Some of the prominent social media sites included in this repository are BlogCatalog, Twitter, MyBlogLog, Digg, StumbleUpon,, MySpace, LiveJournal, The Unofficial Apple Weblog (TUAW), Reddit, etc. The repository contains various facets of blog data including blog site metadata like, user defined tags, predefined categories, blog site description; blog post level metadata like, user defined tags, date and time of posting; blog posts; blog post mood (which is defined as the blogger's emotions when (s)he wrote the blog post); blogger name; blog post comments; and blogger social network.
The CAD-60 and CAD-120 data sets comprise of RGB-D video sequences of humans performing activities which are recording using the Microsoft Kinect sensor. Being able to detect human activities is important for making personal assistant robots useful in performing assistive tasks. Our CAD dataset comprises twelve different activities (composed of several sub-activities) performed by four people in different environments, such as a kitchen, a living room, and office, etc. Tested on robots reactively responding to the detected activities.
DLESE is the Digital Library for Earth System Education, a geoscience community resource that supports teaching and learning about the Earth system. It is funded by the National Science Foundation and is being built by a community of educators, students, and scientists to support Earth system education at all levels and in both formal and informal settings. Resources in DLESE include lesson plans, scientific data, visualizations, interactive computer models, and virtual field trips - in short, any web-accessible teaching or learning material. Many of these resources are organized in collections, or groups of related resources that reflect a coherent, focused theme. In many ways, digital collections are analogous to collections in traditional bricks-and-mortar libraries.
A data repository and social network so that researchers can interact and collaborate, also offers tutorials and datasets for data science learning. " is designed for data and the people who work with data. From professional projects to open data, helps you host and share your data, collaborate with your team, and capture context and conclusions as you work."
Data Basin is a science-based mapping and analysis platform that supports learning, research, and sustainable environmental stewardship.
MINDS@UW is designed to gather, distribute, and preserve digital materials related to the University of Wisconsin's research and instructional mission. Content, which is deposited directly by UW faculty and staff, may include research papers and reports, pre-prints and post-prints, datasets and other primary research materials, learning objects, theses, student projects, conference papers and presentations, and other born-digital or digitized research and instructional materials.
BenchSci is a free platform designed to help biomedical research scientists quickly and easily identify validated antibodies from publications. Using various filters including techniques, tissue, cell lines, and more, scientists can find out published data along with the antibody that match specific experimental contexts within seconds. Free registration & access for academic research scientists.
The online digital research data repository of multi-disciplinary research datasets produced at the University of Nottingham, hosted by Information Services and managed and curated by Libraries, Research & Learning Resources. University of Nottingham researchers who have produced research data associated with an existing or forthcoming publication, or which has potential use for other researchers, are invited to upload their dataset.
IDEALS is an institutional repository that collects, disseminates, and provides persistent and reliable access to the research and scholarship of faculty, staff, and students at the University of Illinois at Urbana-Champaign. Faculty, staff, graduate students, and in some cases undergraduate students, can deposit their research and scholarship directly into IDEALS. Departments can use IDEALS to distribute their working papers, technical reports, or other research material. Contact us at for more information.
Forschungsdatenzentrum für Hochschul- und Wissenschaftsforschung (fdz.DZHW) provides data on quantitative and qualitative surveys of the DZHW. In addition, prepared DZHW-external data from the research field are archived and provided for secondary use. For scientific purposes, Scientific Use Files, for academic purposes,Campus Use Files are offered. The documentation is available in German and for the most part in English.
York Digital Library (YODL) is a University-wide Digital Library service for multimedia resources used in or created through teaching, research and study at the University of York. YODL complements the University's research publications, held in White Rose Research Online and PURE, and the digital teaching materials in the University's Yorkshare Virtual Learning Environment. YODL contains a range of collections, including images, past exam papers, masters dissertations and audio. Some of these are available only to members of the University of York, whilst other material is available to the public. YODL is expanding with more content being added all the time
Kenya Open Data offers visualizations tools, data downloads, and easy access for software developers. Kenya Open Data provides core government development, demographic, statistical and expenditure data available for researchers, policymakers, developers and the general public. Kenya is the first developing country to have an open government data portal, the first in sub-Saharan Africa and second on the continent after Morocco. The initiative has been widely acclaimed globally as one of the most significant steps Kenya has made to improve governance and implement the new Constitution’s provisions on access to information.
The University of Oxford Text Archive develops, collects, catalogues and preserves electronic literary and linguistic resources for use in Higher Education, in research, teaching and learning. We also give advice on the creation and use of these resources, and are involved in the development of standards and infrastructure for electronic language resources.
Child Care & Early Education Research Connections promotes high quality research in child care and early education and the use of that research in policy making. Our vision is that children are well cared for and have rich learning experiences, and their families are supported and able to work. Through this Web site, we offer research and data resources for researchers, policy makers, practitioners, and others.
The Finnish Social Science Data Archive (FSD) is a national resource centre for social science research and teaching. FSD archives, promotes and disseminates digital research data for research, teaching and learning purposes. Data descriptions are published in Finnish and English. Quantitative datasets are translated from Finnish to English on request. Several data are already in English. All services are free of charge. FSD promotes open access to research data, and transparency, accumulation and efficient reuse of scientific research. FSD is a national Service Provider for CESSDA ERIC.
RUresearch Data Portal is a subset of RUcore (Rutgers University Community Repository), provides a platform for Rutgers researchers to share their research data and supplementary resources with the global scholarly community. This data portal leverages all the capabilities of RUcore with additional tools and services specific to research data. It provides data in different clusters (research-genre) with excellent search facility; such as experimental data, multivariate data, discrete data, continuous data, time series data, etc. However it facilitates individual research portals that include the Video Mosaic Collaborative (VMC), an NSF-funded collection of mathematics education videos for Teaching and Research. Its' mission is to maintain the significant intellectual property of Rutgers University; thereby intended to provide open access and the greatest possible impact for digital data collections in a responsible manner to promote research and learning.