Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 122 result(s)
Content type(s)
A place of living memory, the Phonotheque of the MMSH aims to bring together recordings of the sound heritage that have the value of ethnological, linguistic, historical, musicological or literary information on the Mediterranean area. It documents fields little covered by conventional sources, or completes them with the point of view of actors or witnesses. The collection holds more than 8000 hours of audio archives recorded since the late 1950s concerning all the humanities sciences.
Currently the institute has more than 700 collections consisting of (digital) research data, digitized material, archival collections, printed material, handwritten questionnaires, maps and pictures. The focus is on resources relevant for the study of function, meaning and coherence of cultural expressions and resources relevant for the structural, dialectological and sociolinguistic study of language variation within the Dutch language. An overview is here https://meertens.knaw.nl/en/datasets/
META-SHARE, the open language resource exchange facility, is devoted to the sustainable sharing and dissemination of language resources (LRs) and aims at increasing access to such resources in a global scale. META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role. META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.). Data and tools can be both open and with restricted access rights, free and for-a-fee.
MICASE provides a collection of transcripts of academic speech events recorded at the University of Michigan. The original DAT audiotapes are held in the English Language Institute and may be consulted by bona fide researchers under special arrangements. Additional access: https://lsa.umich.edu/eli/language-resources/micase-micusp.html
Mulce (MUltimodal contextualized Learner Corpus Exchange) is a research project supported by the National Research Agency (ANR programme: "Corpus and Tools in the Humanities", ANR-06-CORP-006). A teaching corpus (LETEC - Learning and Teaching Corpora) combines a systematic and structured data set, particularly of interactional data, and traces left by a training course experimentation, conducted partially or completely online and completed by additional technical, human, pedagogical and scientific information to enable the data to be analysed in context.
Språkbanken is a collection of Norwegian language technology resources, and a national infrastructure for language technology and research. Our mandate is to collect and develop language resources, and to make these available for researchers, students and the ICT industry which works with the development of language-based ICT solutions. Språkbanken was established as a language policy initiative, designed to ensure that language technology solutions based on the Norwegian language will be developed, and thereby prevent domain loss of Norwegian in technology-dependent areas, cf. Mål og meining (Report 35, 2007 – 2008). As of today the collection contains resources in both Norwegian Bokmål and Nynorsk, as well as in Swedish, Danish and Norwegian Sign Language (NTS).
The Royal Library of the Netherlands (Dutch: Koninklijke Bibliotheek or KB; Royal Library) is the national library of the Netherlands. The KB collects everything that is published in and concerning the Netherlands, from medieval literature to today's publications. The e-Depot contains the Dutch National Library Collection of born-digital publications from, and about, the Netherlands, and international publications consisting of born-digital scholarly articles included in journals produced by publishers originally based in the Netherlands
OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. The OLAC system has 2016 been integrated with the Linguistic Linked Open Data Cloud.
The figshare service for The Open University was launched in 2016 and allows researchers to store, share and publish research data. It helps the research data to be accessible by storing metadata alongside datasets. Additionally, every uploaded item receives a Digital Object Identifier (DOI), which allows the data to be citable and sustainable. If there are any ethical or copyright concerns about publishing a certain dataset, it is possible to publish the metadata associated with the dataset to help discoverability while sharing the data itself via a private channel through manual approval.
ORTOLANG is an EQUIPEX project accepted in February 2012 in the framework of investissements d’avenir. Its aim is to construct a network infrastructure including a repository of language data (corpora, lexicons, dictionaries etc.) and readily available, well-documented tools for its processing. Expected outcomes comprize: promoting research on analysis, modelling and automatic processing of our language to their highest international levels thanks to effective resource pooling; facilitating the use and transfer of resources and tools set up within public laboratories to industrial partners, notably SMEs which often cannot develop such resources and tools for language processing given the cost of investment; promoting French language and the regional languages of France by sharing expertise acquired by public laboratories. ORTOLANG is a service for the language, which is complementary to the service offered by Huma-Num (très grande infrastructure de recherche). Ortolang gives access to SLDR for speech, and CNRTL for text resources.
Country
PARADISEC (the Pacific And Regional Archive for Digital Sources in Endangered Cultures) offers a facility for digital conservation and access to endangered materials from all over the world. Our research group has developed models to ensure that the archive can provide access to interested communities, and conforms with emerging international standards for digital archiving. We have established a framework for accessioning, cataloguing and digitising audio, text and visual material, and preserving digital copies. The primary focus of this initial stage is safe preservation of material that would otherwise be lost, especially field tapes from the 1950s and 1960s.
The Phonogrammarchiv is a multi-disciplinary research sound and video archive, covering holdings from all continents. Since its foundation in 1899 the Phonogrammarchiv has been building up its holdings by cooperating with Austrian scholars and archiving their collected material, or by fieldwork conducted by staff members on special topics exploring new fields of methods and contents. The main tasks comprise the production, annotation, cataloguing and long-term preservation of audio-visual field recordings, making the cultural heritage available for future generations and enabling the dissemination of the recordings as well as technical developments in the field of AV recording and storage. Thus the Phonogrammarchiv adds to infrastructural performance valuable to both the scholarly community and the public at large.
The focus of PolMine is on texts published by public institutions in Germany. Corpora of parliamentary protocols are at the heart of the project: Parliamentary proceedings are available for long stretches of time, cover a broad set of public policies and are in the public domain, making them a valuable text resource for political science. The project develops repositories of textual data in a sustainable fashion to suit the research needs of political science. Concerning data, the focus is on converting text issued by public institutions into a sustainable digital format (TEI/XML).
PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, belonging to the Portuguese National Roadmap of Research Infrastructures of Strategic Relevance, and part of the international research infrastructure CLARIN ERIC
The CLARIN­/Text+ repository at the Saxon Academy of Sciences and Humanities in Leipzig offers long­term preservation of digital resources, along with their descriptive metadata. The mission of the repository is to ensure the availability and long­term preservation of resources, to preserve knowledge gained in research, to aid the transfer of knowledge into new contexts, and to integrate new methods and resources into university curricula. Among the resources currently available in the Leipzig repository are a set of corpora of the Leipzig Corpora Collection (LCC), based on newspaper, Wikipedia and Web text. Furthermore several REST-based webservices are provided for a variety of different NLP-relevant tasks The repository is part of the CLARIN infrastructure and part of the NFDI consortium Text+. It is operated by the Saxon Academy of Sciences and Humanities in Leipzig.
The CLARIN-D Centre CEDIFOR provides a repository for long-term storage of resources and meta-data. Resources hosted in the repository stem from research of members as well as associated research projects of CEDIFOR. This includes software and web-services as well as corpora of text, lexicons, images and other data.
The Manchester Romani Project is part of an international network of scholarly projects devoted to research on Romani language and linguistics, coordinated in partnership with Dieter Halwachs (Institute of Linguistics, Graz University and Romani-Projekt Graz), and Peter Bakker (Institute of Linguistics, Aarhus University). The project explores the linguistic features of the dialects of the Romani language, and their distribution in geographical space. An interactive web application is being designed, which will allow users to search and locate on a map different dialectal variants, and to explore how variants cluster in particular regions. Examples sentences and words with sound files will also be made available, to give impressions of dialectal variation within Romani. From the distribution of linguistic forms among the dialects it will be possible to make infeences about social-historical contacts among the Romani communities, and about migration patterns.
Sinmin contains texts of different genres and styles of the modern and old Sinhala language. The main sources of electronic copies of texts for the corpus are online Sinhala newspapers, online Sinhala news sites, Sinhala school textbooks available in online, online Sinhala magazines, Sinhala Wikipedia, Sinhala fictions available in online, Mahawansa, Sinhala Blogs, Sinhala subtitles and Sri lankan gazette.
The South African Centre for Digital Language Resources (SADiLaR) is a national centre supported by the Department of Science and Technology (DST). SADiLaR has an enabling function, with a focus on all official languages of South Africa, supporting research and development in the domains of language technologies and language-related studies in the humanities and social sciences.
Competence Centre IULA-UPF-CC CLARIN manages, disseminates and facilitates this catalogue, which provides access to reference information on the use of language technology projects and studies in different disciplines, especially with regard to Humanities and Social Sciences. The Catalog relates information that is organized by Áreas, (disciplines and research topics), Projects (of research that use or have used language technologies), Tasks (that make the tools), Tools (of language technology), Documentation (articles regarding the tools and how they are used) and resources such as Corpora (collections of annotated texts) and Lexica (collections of words for different uses).
>>>>>!!!<<<<< As of 01/12/2015, deposit of data on SLDR website will be suspended to allow the public opening of Ortolang platform https://www.ortolang.fr/#/market/home .>>>>>!!!<<<<<