Filter

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 60 result(s)
Språkbanken is a collection of Norwegian language technology resources, and a national infrastructure for language technology and research. Our mandate is to collect and develop language resources, and to make these available for researchers, students and the ICT industry which works with the development of language-based ICT solutions. Språkbanken was established as a language policy initiative, designed to ensure that language technology solutions based on the Norwegian language will be developed, and thereby prevent domain loss of Norwegian in technology-dependent areas, cf. Mål og meining (Report 35, 2007 – 2008). As of today the collection contains resources in both Norwegian Bokmål and Nynorsk, as well as in Swedish, Danish and Norwegian Sign Language (NTS).
The goal of the Center of Estonian Language Resources (CELR) is to create and manage an infrastructure to make the Estonian language digital resources (dictionaries, corpora – both text and speech –, various language databases) and language technology tools (software) available to everyone working with digital language materials. CELR coordinates and organises the documentation and archiving of the resources as well as develops language technology standards and draws up necessary legal contracts and licences for different types of users (public, academic, commercial, etc.). In addition to collecting language resources, a system will be launched for introducing the resources to, informing and educating the potential users. The main users of CELR are researchers from Estonian R&D institutions and Social Sciences and Humanities researchers all over the world via the CLARIN ERIC network of similar centers in Europe. Access to data is provided through different sites: Public Repository https://entu.keeleressursid.ee/public-document , Language resources https://keeleressursid.ee/en/resources/corpora, and MetaShare CELR https://metashare.ut.ee/
The focus of CLARIN INT Portal is on resources that are relevant to the lexicological study of the Dutch language and on resources relevant for research in and development of language and speech technology. For Example: lexicons, lexical databases, text corpora, speech corpora, language and speech technology tools, etc. The resources are: Cornetto-LMF (Lexicon Markup Framework), Corpus of Contemporary Dutch (Corpus Hedendaags Nederlands), Corpus Gysseling, Corpus VU-DNC (VU University Diachronic News text Corpus), Dictionary of the Frisian Language (Woordenboek der Friese Taal), DuELME-LMF (Lexicon Markup Framework), Language Portal (Taalportaal), Namescape, NERD (Named Entity Recognition and Disambiguation) and TICCLops (Text-Induced Corpus Clean-up online processing system).
Språkbanken (the Swedish Language Bank) was established in 1975 as a national center located in the Faculty of Arts, University of Gothenburg. Alléns groundbreaking corpus linguistic research resulted in the creation of one of the first large electronic text corpora in another language than English, with one million words of newspaper text. The task of Språkbanken is to collect, develop, and store (Swedish) text corpora, and to make linguistic data extracted from the corpora available to researchers and to the public.
The Language Archive Cologne (LAC) is a research data repository for the linguistics and all humanities disciplines working with audiovisual data. The archive forms a cluster of the Data Center for Humanities in cooperation with the Institute of Linguistics of the University of Cologne. The LAC is an archive for language resources, which is freely available via a web-based access. In addition, concrete technical and methodological advice is offered in the research data cycle - from the collection of the data, their preparation and archiving, to publication and reuse.
LINDAT/CLARIN is designed as a Czech “node” of Clarin ERIC (Common Language Resources and Technology Infrastructure). It also supports the goals of the META-NET language technology network. Both networks aim at collection, annotation, development and free sharing of language data and basic technologies between institutions and individuals both in science and in all types of research. The Clarin ERIC infrastructural project is more focused on humanities, while META-NET aims at the development of language technologies and applications. The data stored in the repository are already being used in scientific publications in the Czech Republic.
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
The University of Oxford Text Archive develops, collects, catalogues and preserves electronic literary and linguistic resources for use in Higher Education, in research, teaching and learning. We also give advice on the creation and use of these resources, and are involved in the development of standards and infrastructure for electronic language resources.
CLARIN-UK is a consortium of centres of expertise involved in research and resource creation involving digital language data and tools. The consortium includes the national library, and academic departments and university centres in linguistics, languages, literature and computer science.
Polish CLARIN node – CLARIN-PL Language Technology Centre – is being built at Wrocław University of Technology. The LTC is addressed to scholars in the humanities and social sciences. Registered users are granted free access to digital language resources and advanced tools to explore them. They can also archive and share their own language data (in written, spoken, video or multimodal form).
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. It was formed in 1992 to address the critical data shortage then facing language technology research and development. Initially, LDC's primary role was as a repository and distribution point for language resources. Since that time, and with the help of its members, LDC has grown into an organization that creates and distributes a wide array of language resources. LDC also supports sponsored research programs and language-based technology evaluations by providing resources and contributing organizational expertise. LDC is hosted by the University of Pennsylvania and is a center within the University’s School of Arts and Sciences.
HunCLARIN is a strategic research infrastructure of Hungary’s leading knowledge centres involved in R&D in speech- and language processing. It contains linguistic resources and tools that form the basis of research. The infrastructure has obtained an “SKI” qualification (Strategic Research Infrastructure) in 2010, and has been significantly expanded since. Currently comprising 36 members, the infrastructure includes several general- and specific-purpose text corpora, different language processing tools and analysers, linguistic databases as well as ontologies. RIL HAS was a co-founder of the European CLARIN project, which aims at supporting humanities and social sciences research with the help of language technology and by making digital linguistic resources more easily available. In accordance with these goals HunClarin makes the research infrastructures developed by the respective centres directly accessible for researchers through a common network entry point. A general goal of the infrastructure is to realise the interoperability of the collected research infrastructures and to enable comparing the performance of the respective alternatives and to coordinate different foci in R&D. The coordinator and contact person of the infrastructure is Tamás Váradi, RIL HAS.
Clarin.dk is a Danish IT infrastructure intended for use by humanities scholars. The infrastructure includes digitized research material in the form of written and spoken texts, audio and video records, lexical resources and tools. Part of the resources collected or converted to other formats as part of the project. Other resources developed or collected in other projects and made available to researchers through Clarin.dk. The website is regularly updated with new materials and tools. The vision is to create the humanities researcher's toolbox through the creation of resources with associated tools and integrating resources together in a web-based electronic research environment provided for humanities researchers. Such access to resources and tools will allow scientists unprecedented opportunities and will also help to enhance their ability to participate in European collaborative projects. The Danish CLARIN project will eventually generate better conditions for Danish language technology research and development.
The Text Laboratory provides assistance with databases, word lists, corpora and tailored solutions for language technology. We also work on research and development projects alone or in cooperation with others - locally, nationally and internationally. Services and tools: Word and frequency lists, Written corpora, Speech corpora, Multilingual corpora, Databases, Glossa Search Tool, The Oslo-Bergen Tagger, GREI grammar games, Audio files: dialects from Norway and America etc., Nordic Atlas of Language Structures (NALS) Journal, Norwegian in America, NEALT, Ethiopian Language Technology, Access to Corpora
The repository of the Hamburg Centre for Speech Corpora is used for archiving, maintenance, distribution and development of spoken language corpora. These usually consist of audio and / or video recordings, transcriptions and other data and structured metadata. The corpora treat the focus on multilingualism and are generally freely available for research and teaching. Most of the measures maintained by the HZSK corpora were created in the years 2000-2011 in the framework of the SFB 538 "Multilingualism" at the University of Hamburg. The HZSK however also strives to take linguistic data from other projects or contexts, and to provide also the scientific community for research and teaching are available, provided that they are compatible with the current focus of HZSK, ie especially spoken language and multilingualism.
As a member of SWE-CLARIN, the Humanities Lab will provide tools and expertise related to language archiving, corpus and (meta)data management, with a continued emphasis on multimodal corpora, many of which contain Swedish resources, but also other (often endangered) languages, multilingual or learner corpora. As a CLARIN K-centre we provide advice on multimodal and sensor-based methods, including EEG, eye-tracking, articulography, virtual reality, motion capture, av-recording. Current work targets automatic data retrieval from multimodal data sets, as well as the linking of measurement data (e.g. EEG, fMRI) or geo-demographic data (GIS, GPS) to language data (audio, video, text, annotations). We also provide assistance with speech and language technology related matters to various projects. A primary resource in the Lab is The Humanities Lab corpus server, containing a varied set of multimodal language corpora with standardised metadata and linked layers of annotations and other resources.
Competence Centre IULA-UPF-CC CLARIN manages, disseminates and facilitates this catalogue, which provides access to reference information on the use of language technology projects and studies in different disciplines, especially with regard to Humanities and Social Sciences. The Catalog relates information that is organized by Áreas, (disciplines and research topics), Projects (of research that use or have used language technologies), Tasks (that make the tools), Tools (of language technology), Documentation (articles regarding the tools and how they are used) and resources such as Corpora (collections of annotated texts) and Lexica (collections of words for different uses).
Lithuania became a full member of CLARIN ERIC in January of 2015 and soon CLARIN-LT consortium was founded by three partner universities: Vytautas Magnus University, Kaunas Technology University and Vilnius University. The main goal of the consortium is to become a CLARIN B centre, which will be able to serve language users in Lithuania and Europe for storing and accessing language resources.
The IDS Repository aims at long-term archival of linguistic resources and tools in the field of German studies. It provides data together with metadata in Dublin Core and CMDI formats. The Mannheim Corpus of historical newspapers and magazines consists of 21 german newspapers and magazines from the 18th and 19th century. It comprises about 750 individual volumes on 3532 pages overall. This corpus was assembled and digitized from 2009 to 2011.
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
Currently, the IMS repository focuses on resources provided by the Institute for Natural Language Processing in Stuttgart (IMS) and other CLARIN-D related institutions such as the local Collaborative Research Centre 732 (SFB 732) as well as institutions and/or organizations that belong to the CLARIN-D extended scientific community. Comprehensive guidelines and workflows for submission by external contributors are being compiled based on the experiences in archiving such in-house resources.
The Language Archive is storing a lot of unique material, from a large variety of languages worldwide, which is recorded and analyzed by researchers from different linguistic disciplines. Data creation, management and exploration tools. Archiving and software expertise for the Digital Humanities.
The Tromsø Repository of Language and Linguistics (TROLLing) is designed as an archive of linguistic data and statistical code. The archive is open access, which means that all information is available to everyone. All postings are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant). Linguists worldwide are invited to post datasets and statistical code used in their linguistic research.
CLARIN.SI is the Slovenian node of the European CLARIN (Common Language Resources and Technology Infrastructure) Centers. The CLARIN.SI repository is hosted at the Jožef Stefan Institute and offers long-term preservation of deposited linguistic resources, along with their descriptive metadata. The integration of the repository with the CLARIN infrastructure gives the deposited resources wide exposure, so that they can be known, used and further developed beyond the lifetime of the projects in which they were produced. Among the resources currently available in the CLARIN.SI repository are the multilingual MULTEXT-East resources, the CC version of Slovenian reference corpus Gigafida, the morphological lexicon Sloleks, the IMP corpora and lexicons of historical Slovenian, as well as many other resources for a variety of languages. Furthermore, several REST-based web services are provided for different corpus-linguistic and NLP tasks.
ARCHE (A Resource Centre for the HumanitiEs) is a service aimed at offering stable and persistent hosting as well as dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields. ARCHE - being the successor of the Language Resources Portal (LRP) - is offering language resources as Austria’s connection point to the European network of CLARIN Centres.