Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 53 result(s)
Competence Centre IULA-UPF-CC CLARIN manages, disseminates and facilitates this catalogue, which provides access to reference information on the use of language technology projects and studies in different disciplines, especially with regard to Humanities and Social Sciences. The Catalog relates information that is organized by Áreas, (disciplines and research topics), Projects (of research that use or have used language technologies), Tasks (that make the tools), Tools (of language technology), Documentation (articles regarding the tools and how they are used) and resources such as Corpora (collections of annotated texts) and Lexica (collections of words for different uses).
Country
The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors (many of them the leading authorities on the subject).
OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.
META-SHARE, the open language resource exchange facility, is devoted to the sustainable sharing and dissemination of language resources (LRs) and aims at increasing access to such resources in a global scale. META-SHARE is an open, integrated, secure and interoperable sharing and exchange facility for LRs (datasets and tools) for the Human Language Technologies domain and other applicative domains where language plays a critical role. META-SHARE is implemented in the framework of the META-NET Network of Excellence. It is designed as a network of distributed repositories of LRs, including language data and basic language processing tools (e.g., morphological analysers, PoS taggers, speech recognisers, etc.). Data and tools can be both open and with restricted access rights, free and for-a-fee.
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
The Language Bank features text and speech corpora with different kinds of annotations in over 60 languages. There is also a selection of tools for working with them, from linguistic analyzers to programming environments. Corpora are also available via web interfaces, and users can be allowed to download some of them. The IP holders can monitor the use of their resources and view user statistics.
Country
PARADISEC (the Pacific And Regional Archive for Digital Sources in Endangered Cultures) offers a facility for digital conservation and access to endangered materials from all over the world. Our research group has developed models to ensure that the archive can provide access to interested communities, and conforms with emerging international standards for digital archiving. We have established a framework for accessioning, cataloguing and digitising audio, text and visual material, and preserving digital copies. The primary focus of this initial stage is safe preservation of material that would otherwise be lost, especially field tapes from the 1950s and 1960s.
Country
The Universitat de Barcelona Digital Repository is an institutional resource containing open-access digital versions of publications related to the teaching, research and institutional activities of the UB's teaching staff and other members of the university community, including research data.
ARCHE (A Resource Centre for the HumanitiEs) is a service aimed at offering stable and persistent hosting as well as dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields. ARCHE - being the successor of the Language Resources Portal (LRP) - is offering language resources as Austria’s connection point to the European network of CLARIN Centres.
Currently, the IMS repository focuses on resources provided by the Institute for Natural Language Processing in Stuttgart (IMS) and other CLARIN-D related institutions such as the local Collaborative Research Centre 732 (SFB 732) as well as institutions and/or organizations that belong to the CLARIN-D extended scientific community. Comprehensive guidelines and workflows for submission by external contributors are being compiled based on the experiences in archiving such in-house resources.
ANPERSANA is the digital library of IKER (UMR 5478), a research centre specialized in Basque language and texts. The online library platform receives and disseminates primary sources of data issued from research in Basque language and culture. As of today, two corpora of documents have been published. The first one, is a collection of private letters written in an 18th century variety of Basque, documented in and transcribed to modern standard Basque. The discovery of the collection, named Le Dauphin, has enabled the emerging of new questions about the history and sociology of writing in the domain of minority languages, not only in France, but also among the whole Atlantic Arc. The second of the two corpora is a selection of sound recordings about monodic chant in the Basque Country. The documents were collected as part of a PhD thesis research work that took place between 2003 and 2012. It's a total of 50 hours of interviews with francophone and bascophone cultural representatives carried out at either their workplace of the informers or in public areas. ANPERSANA is bundled with an advanced search engine. The documents have been indexed and geo-localized on an interactive map. The platform is engaged with open access and all the resources can be uploaded freely under the different Creative Commons (CC) licenses.
The figshare service for The Open University was launched in 2016 and allows researchers to store, share and publish research data. It helps the research data to be accessible by storing metadata alongside datasets. Additionally, every uploaded item receives a Digital Object Identifier (DOI), which allows the data to be citable and sustainable. If there are any ethical or copyright concerns about publishing a certain dataset, it is possible to publish the metadata associated with the dataset to help discoverability while sharing the data itself via a private channel through manual approval.
Country
The Australian National Corpus collates and provides access to assorted examples of Australian English text, transcriptions, audio and audio-visual materials. Text analysis tools are embedded in the interface allowing analysis and downloads in *.CSV format.
CLARINO Bergen Center repository is the repository of CLARINO, the Norwegian infrastructure project . Its goal is to implement the Norwegian part of CLARIN. The ultimate aim is to make existing and future language resources easily accessible for researchers and to bring eScience to humanities disciplines. The repository includes INESS the Norwegian Infrastructure for the Exploration of Syntax and Semantics. This infrastructure provides access to treebanks, which are databases of syntactically and semantically annotated sentences.
The Polinsky Language Sciences Lab at Harvard University is a linguistics lab that examines questions of language structure and its effect on the ways in which people use and process language in real time. We engage in linguistic and interdisciplinary research projects ourselves; offer linguistic research capabilities for undergraduate and graduate students, faculty, and visitors; and build relationships with the linguistic communities in which we do our research. We are interested in a broad range of issues pertaining to syntax, interfaces, and cross-linguistic variation. We place a particular emphasis on novel experimental evidence that facilitates the construction of linguistic theory. We have a strong cross-linguistic focus, drawing upon English, Russian, Chinese, Korean, Mayan languages, Basque, Austronesian languages, languages of the Caucasus, and others. We believe that challenging existing theories with data from as broad a range of languages as possible is a crucial component of the successful development of linguistic theory. We investigate both fluent speakers and heritage speakers—those who grew up hearing or speaking a particular language but who are now more fluent in a different, societally dominant language. Heritage languages, a novel field of linguistic inquiry, are important because they provide new insights into processes of linguistic development and attrition in general, thus increasing our understanding of the human capacity to maintain and acquire language. Understanding language use and processing in real time and how children acquire language helps us improve language study and pedagogy, which in turn improves communication across the globe. Although our lab does not specialize in language acquisition, we have conducted some studies of acquisition of lesser-studied languages and heritage languages, with the purpose of comparing heritage speakers to adults.
By stimulating inspiring research and producing innovative tools, Huygens ING intends to open up old and inaccessible sources, and to understand them better. Huygens ING’s focus is on Digital Humanities, History, History of Science, and Textual Scholarship. Huygens ING pursues research in the fields of History, Literary Studies, the History of Science and Digital Humanities. Huygens ING aims to publish digital sources and data responsibly and with care. Innovative tools are made as widely available as possible. We strive to share the available knowledge at the institute with both academic peers and the wider public.
The aim of the project is systematic mapping of Czech and other languages in comparison with Czech. CNC corpora are accessible to everybody interested in studying the language after free registration.
The CLARIN Centre at the University of Copenhagen, Denmark, hosts and manages a data repository (CLARIN-DK-UCPH Repository), which is part of a research infrastructure for humanities and social sciences financed by the University of Copenhagen, and a part of the national infrastructure collaboration DIGHUMLAB in Denmark. The CLARIN-DK-UCPH Repository provides easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form) and provides advanced tools for discovering, exploring, exploiting, annotating, and analyzing data. CLARIN-DK also shares knowledge on Danish language technology and resources and is the Danish node in the European Research Infrastructure Consortium, CLARIN ERIC.
LINDAT/CLARIN is designed as a Czech “node” of Clarin ERIC (Common Language Resources and Technology Infrastructure). It also supports the goals of the META-NET language technology network. Both networks aim at collection, annotation, development and free sharing of language data and basic technologies between institutions and individuals both in science and in all types of research. The Clarin ERIC infrastructural project is more focused on humanities, while META-NET aims at the development of language technologies and applications. The data stored in the repository are already being used in scientific publications in the Czech Republic.
The Text Laboratory provides assistance with databases, word lists, corpora and tailored solutions for language technology. We also work on research and development projects alone or in cooperation with others - locally, nationally and internationally. Services and tools: Word and frequency lists, Written corpora, Speech corpora, Multilingual corpora, Databases, Glossa Search Tool, The Oslo-Bergen Tagger, GREI grammar games, Audio files: dialects from Norway and America etc., Nordic Atlas of Language Structures (NALS) Journal, Norwegian in America, NEALT, Ethiopian Language Technology, Access to Corpora
Additionally to the institutional repository, current St. Edward's faculty have the option of uploading their work directly to their own SEU accounts on stedwards.figshare.com. Projects created on Figshare will automatically be published on this website as well. For more information, please see documentation
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. It was formed in 1992 to address the critical data shortage then facing language technology research and development. Initially, LDC's primary role was as a repository and distribution point for language resources. Since that time, and with the help of its members, LDC has grown into an organization that creates and distributes a wide array of language resources. LDC also supports sponsored research programs and language-based technology evaluations by providing resources and contributing organizational expertise. LDC is hosted by the University of Pennsylvania and is a center within the University’s School of Arts and Sciences.
Cocoon "COllections de COrpus Oraux Numériques" is a technical platform that accompanies the oral resource producers, create, organize and archive their corpus; a corpus can consist of records (usually audio) possibly accompanied by annotations of these records. The resources registered are first cataloged and stored while, and then, secondly archived in the archive of the TGIR Huma-Num. The author and his institution are responsible for filings and may benefit from a restricted and secure access to their data for a defined period, if the content of the information is considered sensitive. The COCOON platform is jointly operated by two joint research units: Laboratoire de Langues et civilisations à tradition orale (LACITO - UMR7107 - Université Paris3 / INALCO / CNRS) and Laboratoire Ligérien de Linguistique (LLL - UMR7270 - Universités d'Orléans et de Tours, BnF, CNRS).