Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 122 result(s)
The Alaska Native Language Archive houses documentation of the various Native languages of Alaska and helps to preserve and cultivate this unique heritage for future generations. As the premier repository worldwide for information relating to the Native languages of Alaska, the Archive serves researchers, teachers and students, as well as members of the broader community. The collection includes both published and unpublished materials in or on all of the Alaska Native languages and related languages. The collection has enduring cultural, historic, and intellectual value, particularly for Alaska Native language speakers and their descendants
ANPERSANA is the digital library of IKER (UMR 5478), a research centre specialized in Basque language and texts. The online library platform receives and disseminates primary sources of data issued from research in Basque language and culture. As of today, two corpora of documents have been published. The first one, is a collection of private letters written in an 18th century variety of Basque, documented in and transcribed to modern standard Basque. The discovery of the collection, named Le Dauphin, has enabled the emerging of new questions about the history and sociology of writing in the domain of minority languages, not only in France, but also among the whole Atlantic Arc. The second of the two corpora is a selection of sound recordings about monodic chant in the Basque Country. The documents were collected as part of a PhD thesis research work that took place between 2003 and 2012. It's a total of 50 hours of interviews with francophone and bascophone cultural representatives carried out at either their workplace of the informers or in public areas. ANPERSANA is bundled with an advanced search engine. The documents have been indexed and geo-localized on an interactive map. The platform is engaged with open access and all the resources can be uploaded freely under the different Creative Commons (CC) licenses.
ARCHE (A Resource Centre for the HumanitiEs) is a service aimed at offering stable and persistent hosting as well as dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields. ARCHE is the successor of the Language Resources Portal (LRP) and acts as Austria’s connection point to the European network of CLARIN Centres for language resources.
Country
Arquivo.pt is a research infrastructure that preserves millions of files collected from the web since 1996 and provides a public search service over this information. It contains information in several languages. Periodically it collects and stores information published on the web. Then, it processes the collect data to make it searchable, providing a “Google-like” service that enables searching the past web (English user interface available at https://arquivo.pt/?l=en). This preservation workflow is performed through a large-scale distributed information system and can also accessed through API (https://arquivo.pt/api).
Country
The Australian National Corpus collates and provides access to assorted examples of Australian English text, transcriptions, audio and audio-visual materials. Text analysis tools are embedded in the interface allowing analysis and downloads in *.CSV format.
The University research data repository – BathSPAdata – enables staff to upload their research data into a secure space, and to share this data publicly where appropriate, or where funders or publishers require this as part of their conditions. Resources and toolkits for external use can be made available through this forum, and can be used by Schools, policy makers, business and industry, and the cultural sector.
The Bavarian Archive for Speech Signals (BAS) is a public institution hosted by the University of Munich. This institution was founded with the aim of making corpora of current spoken German available to both the basic research and the speech technology communities via a maximally comprehensive digital speech-signal database. The speech material will be structured in a manner allowing flexible and precise access, with acoustic-phonetic and linguistic-phonetic evaluation forming an integral part of it.
Country
The program "Humanist Virtual Libraries" distributes heritage documents and pursues research associating skills in human sciences and computer science. It aggregates several types of digital documents: A selection of facsimiles of Renaissance works digitized in the Central Region and in partner institutions, the Epistemon Textual Database, which offers digital editions in XML-TEI, and Transcripts or analyzes of notarial minutes and manuscripts
Country
bonndata is the institutional, FAIR-aligned and curated, cross-disciplinary research data repository for the publication of research data for all researchers at the University of Bonn. The repository is fully embedded into the University IT and Data Center and curated by the Research Data Service Center (https://www.forschungsdaten.uni-bonn.de/en). The software that bonndata is based on is the open source software Dataverse (https://dataverse.org)
Country
Based on Bowerman & Pederson’s Topological Relations Picture Series (TRPS), the author has researched the semantic space of static spatial prepositions of Hieroglyphic Ancient Egyptian (Egyptian, Afro-Asiatic), Arabic, English, French, German, Hebrew, Italian, Russian, and Spanish. This repository publication publishes the raw data.
The Buckeye Corpus of conversational speech contains high-quality recordings from 40 speakers in Columbus OH conversing freely with an interviewer. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is currently being written.
The platform hosts the critical edition of the letters written to Jacob Burckhardt, reconstructing in open access one of the most important European correspondences of the 19th century. Save a few exceptions, these letters are all unpublished. On a later stage, the project aims to publish also Jacob Burckhardt’s letters. The editing process has been carried out using Muruca semantic digital library framework. The Muruca framework has been modified over the project, as the requirements of the philological researchers emerged more clearly. The results are stored in and accessible from the front-end of the platform.
The goal of the Center of Estonian Language Resources (CELR) is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora – both text and speech –, various language databases) and language technology tools (software) available to everyone working with digital language materials. CELR coordinates and organises the documentation and archiving of the resources as well as develops language technology standards and draws up necessary legal contracts and licences for different types of users (public, academic, commercial, etc.). In addition to collecting language resources, a system will be launched for introducing the resources to, informing and educating the potential users. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe. Access to data is provided through different sites: Public Repository https://entu.keeleressursid.ee/public-document , Language resources https://keeleressursid.ee/en/resources/corpora, and MetaShare CELR https://metashare.ut.ee/
Country
Created in 2005 by the CNRS, CNRTL unites in a single portal, a set of linguistic resources and tools for language processing. The CNRTL includes the identification, documentation (metadata), standardization, storage, enhancement and dissemination of resources. The sustainability of the service and the data is guaranteed by the backing of the UMR ATILF (CNRS - Université Nancy), support of the CNRS and its integration in the excellence equipment project ORTOLANG .
CHILDES is the child language component of the TalkBank system. TalkBank is a system for sharing and studying conversational interactions.
CLAPOP is the portal of the Dutch CLARIN community. It brings together all relevant resources that were created within the CLARIN NL project and that now are part of the CLARIN NL infrastructure or that were created by other projects but are essential for the functioning of the CLARIN (NL) infrastructure. CLARIN-NL has closely cooperated with CLARIN Flanders in a number of projects. The common results of this cooperation and the results of this cooperation created by CLARIN Flanders are included here as well.
Content type(s)
>>>!!!<<<<ARCHE https://www.re3data.org/repository/r3d100012523 is the successor of a repository project established in 2014 as CLARIN Centre Vienna / Language Resources Portal (CCV/LRP). The mission of CCV/LRP was to provide depositing services and easy and sustainable access to digital language resources created in Austria. ARCHE replaces CCV/LRP and extends its mission by offering an advanced and reliable data management and depositing service open to a broader range of humanities fields in Austria. >>>!!!<<<
The focus of CLARIN INT Portal is on resources that are relevant to the lexicological study of the Dutch language and on resources relevant for research in and development of language and speech technology. For Example: lexicons, lexical databases, text corpora, speech corpora, language and speech technology tools, etc. The resources are: Cornetto-LMF (Lexicon Markup Framework), Corpus of Contemporary Dutch (Corpus Hedendaags Nederlands), Corpus Gysseling, Corpus VU-DNC (VU University Diachronic News text Corpus), Dictionary of the Frisian Language (Woordenboek der Friese Taal), DuELME-LMF (Lexicon Markup Framework), Language Portal (Taalportaal), Namescape, NERD (Named Entity Recognition and Disambiguation) and TICCLops (Text-Induced Corpus Clean-up online processing system).
The knowledge centre is an information service offering advice on the use of digital language resources and tools for Swedish and other languages in Sweden, as well as other parts of the intangible cultural heritage of Sweden.
The repository is part of the National Research Data Infrastructure initiative Text+, in which the University of Tübingen is a partner. It is housed at the Department of General and Computational Linguistics. The infrastructure is maintained in close cooperation with the Digital Humanities Centre, which is a core facility of the university, colaborating with the library and computing center of the university. Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed. Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.
Content type(s)
The Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) is a CLARIN partner institution and has been an officially certified CLARIN service center since June 20th, 2013. The CLARIN center at the BBAW focuses on historical text corpora (predominantly provided by the 'Deutsches Textarchiv'/German Text Archive, DTA) as well as on lexical resources (e.g. dictionaries provided by the 'Digitales Wörterbuch der Deutschen Sprache'/Digital Dictionary of the German Language, DWDS).
The CLARIN Centre at the University of Copenhagen, Denmark, hosts and manages a data repository (CLARIN-DK-UCPH Repository), which is part of a research infrastructure for humanities and social sciences financed by the University of Copenhagen. The CLARIN-DK-UCPH Repository provides easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form) and provides advanced tools for discovering, exploring, exploiting, annotating, and analyzing data. CLARIN-DK also shares knowledge on Danish language technology and resources and is the Danish node in the European Research Infrastructure Consortium, CLARIN ERIC.
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
Iceland joined CLARIN ERIC on February 1st, 2020, after having been an observer since November 2018. The Ministry of Education, Science and Culture assigned The Árni Magnússon Institute for Icelandic Studies the role of leading partner in the Icelandic National Consortium and appointed Professor Emeritus Eiríkur Rögnvaldsson as National Coordinator, later replaced by Starkaður Barkarson, a project manager at The Árni Magnússon Institute. Most of the relevant institutions participate in the CLARIN-IS National Consortium. The Árni Magnússon Institute has already established a Metadata Providing Centre (CLARIN C-Centre) which hosts metadata for Icelandic language resources and makes them available through the Virtual Language Observatory. The aim is to establish a Service Providing Centre (CLARIN B-Centre) which will provide both service and access to resources and knowledge.
Lithuania became a full member of CLARIN ERIC in January of 2015 and soon CLARIN-LT consortium was founded by three partner universities: Vytautas Magnus University, Kaunas Technology University and Vilnius University. The main goal of the consortium is to become a CLARIN B centre, which will be able to serve language users in Lithuania and Europe for storing and accessing language resources.