Reset all


Content Types


AID systems



Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type


Metadata standards

PID systems

Provider types

Quality management

Repository languages



Repository types


  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 15 result(s)
The Language Bank features text and speech corpora with different kinds of annotations in over 60 languages. There is also a selection of tools for working with them, from linguistic analyzers to programming environments. Corpora are also available via web interfaces, and users can be allowed to download some of them. The IP holders can monitor the use of their resources and view user statistics.
GovData the data portal for Germany offers consistent and central access to administrative data at the federal, state, and local level. Objective is to make data more available and easier to use at a single location. As set out in the concept of "open data", we attempt to facilitate the use of open licenses and to increase the supply of machine-readable raw data.
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
The Research Collection is ETH Zurich's publication platform. It unites the functions of a university bibliography, an open access repository and a research data repository within one platform. Researchers who are affiliated with ETH Zurich, the Swiss Federal Institute of Technology, may deposit research data from all domains. They can publish data as a standalone publication, publish it as supplementary material for an article, dissertation or another text, share it with colleagues or a research group, or deposit it for archiving purposes. Research-data-specific features include flexible access rights settings, DOI registration and a DOI preview workflow, content previews for zip- and tar-containers, as well as download statistics and altmetrics for published data. All data uploaded to the Research Collection are also transferred to the ETH Data Archive, ETH Zurich’s long-term archive.
ARCHE (A Resource Centre for the HumanitiEs) is a service aimed at offering stable and persistent hosting as well as dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields. ARCHE - being the successor of the Language Resources Portal (LRP) - is offering language resources as Austria’s connection point to the European network of CLARIN Centres.
Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library. It is written in C++ and easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. SNAP is also available through the NodeXL which is a graphical front-end that integrates network analysis into Microsoft Office and Excel. The SNAP library is being actively developed since 2004 and is organically growing as a result of our research pursuits in analysis of large social and information networks. Largest network we analyzed so far using the library was the Microsoft Instant Messenger network from 2006 with 240 million nodes and 1.3 billion edges. The datasets available on the website were mostly collected (scraped) for the purposes of our research. The website was launched in July 2009.
CaltechDATA is an institutional data repository for Caltech. Caltech library runs the repository to preserve the accomplishments of Caltech researchers and share their results with the world. Caltech-associated researchers can upload data, link data with their publications, and assign a permanent DOI so that others can reference the data set. The repository also preserves software and has automatic Github integration. All files present in the repository are open access or embargoed, and all metadata is always available to the public.
ETH Data Archive is ETH Zurich's long-term preservation solution for digital information such as research data, documents or images. It serves as the backbone of data curation and for most of its content, it is a “dark archive” without public access. In this capacity, the ETH Data Archive also archives the content of ETH Zurich’s Research Collection which is the primary repository for members of the university and the first point of contact for publication of data at ETH Zurich. All data that was produced in the context of research at the ETH Zurich, can be published and archived in the Research Collection. In the following cases, a direct data upload into the ETH Data Archive though, has to be considered: - Upload and registration of software code according to ETH transfer’s requirements for Software Disclosure. - A substantial number of files, have to be regularly submitted for long-term archiving and/or publishing and browser-based upload is not an option: the ETH Data Archive may offer automated data and metadata transfers from source applications (e.g. from a LIMS) via API. - Files for a project on a local computer have to be collected and metadata has to be added before uploading the data to the ETH Data Archive: -- we provide you with the local file editor docuteam packer. Docuteam packer allows to structure, describe, and organise data for an upload into the ETH Data Archive and the depositor decides when submission is due.
FLOSSmole is a collaborative collection of free, libre, and open source software (FLOSS) data. FLOSSmole contains nearly 1 TB of data covering the period 2004 until now, about more than 500,000 different open source projects.
myExperiment is a collaborative environment where scientists can safely publish their workflows and in silico experiments, share them with groups and find those of others. Workflows, other digital objects and bundles (called Packs) can now be swapped, sorted and searched like photos and videos on the Web. Unlike Facebook or MySpace, myExperiment fully understands the needs of the researcher and makes it really easy for the next generation of scientists to contribute to a pool of scientific methods, build communities and form relationships — reducing time-to-experiment, sharing expertise and avoiding reinvention. myExperiment is now the largest public repository of scientific workflows.
The goal of the Center of Estonian Language Resources (CELR) is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora – both text and speech –, various language databases) and language technology tools (software) available to everyone working with digital language materials. CELR coordinates and organises the documentation and archiving of the resources as well as develops language technology standards and draws up necessary legal contracts and licences for different types of users (public, academic, commercial, etc.). In addition to collecting language resources, a system will be launched for introducing the resources to, informing and educating the potential users. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe. Access to data is provided through different sites: Public Repository , Language resources, and MetaShare CELR
clarin:el is the Greek national network of language resources, a nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources. CLARIN EL infrastructure, which is a Greek nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources (LRs) and aims at increasing access to and augmentation of such resources at a national scale and beyond. It is an open, integrated, secure and interoperable storage, sharing and processing infrastructure for LRs (datasets, tools and processing services) for all domains domains and disciplines where language plays a critical role, notably. CLARIN EL is implemented in the framework of the CLARIN Attiki, national project in support of ESFRI/2006 Research Infrastructures.
CLARIN.SI is the Slovenian node of the European CLARIN (Common Language Resources and Technology Infrastructure) Centers. The CLARIN.SI repository is hosted at the Jožef Stefan Institute and offers long-term preservation of deposited linguistic resources, along with their descriptive metadata. The integration of the repository with the CLARIN infrastructure gives the deposited resources wide exposure, so that they can be known, used and further developed beyond the lifetime of the projects in which they were produced. Among the resources currently available in the CLARIN.SI repository are the multilingual MULTEXT-East resources, the CC version of Slovenian reference corpus Gigafida, the morphological lexicon Sloleks, the IMP corpora and lexicons of historical Slovenian, as well as many other resources for a variety of languages. Furthermore, several REST-based web services are provided for different corpus-linguistic and NLP tasks.
ILC-CNR for CLARIN-IT repository is a library for linguistic data and tools. Including: Text Processing and Computational Philology; Natural Language Processing and Knowledge Extraction; Resources, Standards and Infrastructures; Computational Models of Language Usage. The studies carried out within each area are highly interdisciplinary and involve different professional skills and expertises that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.
figshare allows researchers to publish all of their research outputs in an easily citable, sharable and discoverable manner. All file formats can be published, including videos and datasets. Optional peer review process. figshare uses creative commons licensing.