Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 177 result(s)
The repository of the Hamburg Centre for Speech Corpora is used for archiving, maintenance, distribution and development of spoken language corpora. These usually consist of audio and / or video recordings, transcriptions and other data and structured metadata. The corpora treat the focus on multilingualism and are generally freely available for research and teaching. Most of the measures maintained by the HZSK corpora were created in the years 2000-2011 in the framework of the SFB 538 "Multilingualism" at the University of Hamburg. The HZSK however also strives to take linguistic data from other projects or contexts, and to provide also the scientific community for research and teaching are available, provided that they are compatible with the current focus of HZSK, ie especially spoken language and multilingualism.
The Social Science Data Archive is still active and maintained as part of the UCLA Library Data Science Center. SSDA Dataverse is one of the archiving opportunities of SSDA, the others are: Data can be archived by SSDA itself ( or by ICPSR or by UCLA Library or by California Digital Library. The Social Science Data Archives serves the UCLA campus as an archive of faculty and graduate student survey research. We provide long term storage of data files and documentation. We ensure that the data are useable in the future by migrating files to new operating systems. We follow government standards and archival best practices. The mission of the Social Science Data Archive has been and continues to be to provide a foundation for social science research with faculty support throughout an entire research project involving original data collection or the reuse of publicly available studies. Data Archive staff and researchers work as partners throughout all stages of the research process, beginning when a hypothesis or area of study is being developed, during grant and funding activities, while data collection and/or analysis is ongoing, and finally in long term preservation of research results. Our role is to provide a collaborative environment where the focus is on understanding the nature and scope of research approach and management of research output throughout the entire life cycle of the project. Instructional support, especially support that links research with instruction is also a mainstay of operations.
The gift of the Stowell Datasets, a digital archive of psychographic data, to the College of Liberal Arts (and continued gift of new datasets) provide a unique opportunity for WSU to facilitate access to a valuable research resource. The datasets include over 350 individual major media market surveys (CATI, Random Digit Dialing telephone surveys) collected over the period 1989-2001 and feature approximately n=1,000+ respondents for each market for each year.
The COordinated Molecular Probe Line Extinction Thermal Emission Survey of Star Forming Regions (COMPLETE) provides a range of data complementary to the Spitzer Legacy Program "From Molecular Cores to Planet Forming Disks" (c2d) for the Perseus, Ophiuchus and Serpens regions. In combination with the Spitzer observations, COMPLETE will allow for detailed analysis and understanding of the physics of star formation on scales from 500 A.U. to 10 pc.
A data repository for researchers affiliated with Ryerson University. This resource is part of the Scholar's Portal Dataverse which is a service provided by the Ontario Council of University Libraries.
The aim of CfA Library Datasets Dataverse is creating a better information system to respond to the changing needs of astronomers not only at the CfA, but worldwide as well. As part of this growing partnership with the ADS, the CfA Library is expanding its metadata and data curation services, and in the process, creating datasets that the astronomy community may find useful. The CfA Library Datasets Dataverse has been created to share these datasets with the greater community with the hope that some members may find it useful. Please remember to acknowledge the CfA Library and the ADS and cite the work using the "Data Citation" presented under each study's "Cataloging Information" section.
The University of Guelph Library maintains two research data repositories to preserve and provide access to datasets and other research materials resulting from University of Guelph research projects. The Agri-environmental Research Data repository preserves and provides access to agricultural and environmental data. The University of Guelph Research Data Repository houses research data from all other disciplines.
Gambling Research Exchange Ontario (GREO) is a knowledge translation and exchange organization that aims to eliminate harm from gambling. Our goal is to support evidence-informed decision making in responsible gambling policies, standards and practices. In line with this mandate, datasets curated in this archive relate to gambling and reducing gambling related harms.
The Tromsø Repository of Language and Linguistics (TROLLing) is designed as an archive of linguistic data and statistical code. The archive is open access, which means that all information is available to everyone. All postings are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant). Linguists worldwide are invited to post datasets and statistical code used in their linguistic research.
RiuNet is intended to save the University community's production, personal or institutional, in collections. These can be made up of different types of documents such as Objects of learning (Polimedia, virtual labs and educational articles), theses, journal articles, maps, scholary works, creative works, institutional heritage, multimedia, teaching material, institutional production, electronic journals, conference proceedings and research data.
TUdatalib is the institutional repository of the TU Darmstadt for research data. It enables the structured storage of research data and descriptive metadata, long-term archiving (at least 10 years) and, if desired, the publication of data including DOI assignment. In addition there is a fine granular rights and role management.
The Smithsonian Repository is a digital service that collects, preserves, and disseminates research materials via several communities including Research Data Sets. It preserves and protects the organization's legacy...
The Bavarian Archive for Speech Signals (BAS) is a public institution hosted by the University of Munich. This institution was founded with the aim of making corpora of current spoken German available to both the basic research and the speech technology communities via a maximally comprehensive digital speech-signal database. The speech material will be structured in a manner allowing flexible and precise access, with acoustic-phonetic and linguistic-phonetic evaluation forming an integral part of it.
Content type(s)
Scicat allows users to access the metadata of raw and derived data which is taken at experiment facilities. Scientific datasets are linked to proposals and samples. Scientific datasets are can be linked to publications (DOI, PID). SciCat helps keeping track of data provenance (i.e. the steps leading to the final results). Scicat allows users to find data based on the metadata (both your own data and other peoples’ public data). In the long term, SciCat will help to automate scientific analysis workflows.
The Language Archive is storing a lot of unique material, from a large variety of languages worldwide, which is recorded and analyzed by researchers from different linguistic disciplines. Data creation, management and exploration tools. Archiving and software expertise for the Digital Humanities.
Mulce (MUltimodal contextualized Learner Corpus Exchange) is a research project supported by the National Research Agency (ANR programme: "Corpus and Tools in the Humanities", ANR-06-CORP-006). A teaching corpus (LETEC - Learning and Teaching Corpora) combines a systematic and structured data set, particularly of interactional data, and traces left by a training course experimentation, conducted partially or completely online and completed by additional technical, human, pedagogical and scientific information to enable the data to be analysed in context.
The South African Centre for Digital Language Resources (SADiLaR) is a national centre supported by the Department of Science and Technology (DST). SADiLaR has an enabling function, with a focus on all official languages of South Africa, supporting research and development in the domains of language technologies and language-related studies in the humanities and social sciences.
Polish CLARIN node – CLARIN-PL Language Technology Centre – is being built at Wrocław University of Technology. The LTC is addressed to scholars in the humanities and social sciences. Registered users are granted free access to digital language resources and advanced tools to explore them. They can also archive and share their own language data (in written, spoken, video or multimodal form).
The Scientific Database of the Federal University of Paraná aims to gather the scientific data used in the researches that were published by the UFPR community in theses, dissertations, journal articles, and other bibliographic materials. BDC joins RDI / UFPR as an innovative service that tracks the worldwide trend in research planning, management, production, organization, storage, dissemination and reuse. The availability of research data contributes to the transparency and optimization of scientific production through the reuse of data sets and the possibility of new analyzes and approaches
Earth-Prints is an open archive created and maintained by Istituto Nazionale di Geofisica e Vulcanologia. This digital collection allows users to browse, search and access manuscripts, journal articles, theses, conference materials, books, book-chapters, web products. The goal of our repository is to collect, capture, disseminate and preserve the results of research in the fields of Atmosphere, Cryosphere, Hydrosphere and Solid Earth. Earth-prints is young and growing rapidly.
The Materials Data Facility (MDF) is set of data services built specifically to support materials science researchers. MDF consists of two synergistic services, data publication and data discovery (in development). The production-ready data publication service offers a scalable repository where materials scientists can publish, preserve, and share research data. The repository provides a focal point for the materials community, enabling publication and discovery of materials data of all sizes.
The project is set up in order to improve the infrastructure for text-based linguistic research and development by building a huge, automatically annotated German text corpus and the corresponding tools for corpus annotation and exploitation. DeReKo constitutes the largest linguistically motivated collection of contemporary German texts, contains fictional, scientific and newspaper texts, as well as several other text types, contains only licenced texts, is encoded with rich meta-textual information, is fully annotated morphosyntactically (three concurrent annotations), is continually expanded, with a focus on size and stratification of data, may be analyzed free of charge via the query system COSMAS II, serves as a 'primordial sample' from which users may draw specialized sub-samples (socalled 'virtual corpora') to represent the language domain they wish to investigate.
B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on their research data across multiple administrative domains in a trustworthy manner. A solution to: provide an abstraction layer which virtualizes large-scale data resources, guard against data loss in long-term archiving and preservation, optimize access for users from different regions, bring data closer to powerful computers for compute-intensive analysis
DRO is Deakin University's research repository, providing digital curation by describing and preserving the University's research output and enabling worldwide discovery.