Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 18 result(s)
CLARIN is a European Research Infrastructure for the Humanities and Social Sciences, focusing on language resources (data and tools). It is being implemented and constantly improved at leading institutions in a large and growing number of European countries, aiming at improving Europe's multi-linguality competence. CLARIN provides several services, such as access to language data and tools to analyze data, and offers to deposit research data, as well as direct access to knowledge about relevant topics in relation to (research on and with) language resources. The main tool is the 'Virtual Language Observatory' providing metadata and access to the different national CLARIN centers and their data.
OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. The OLAC system has 2016 been integrated with the Linguistic Linked Open Data Cloud.
Country
GAMS is an OAIS compliant asset management system for the management, publication and long-term archiving of digital resources from the Humanities.
The German Text Archive (Deutsches Textarchiv, DTA) presents online a selection of key German-language works in various disciplines from the 17th to 19th centuries. The electronic full-texts are indexed linguistically and the search facilities tolerate a range of spelling variants. The DTA presents German-language printed works from around 1650 to 1900 as full text and as digital facsimile. The selection of texts was made on the basis of lexicographical criteria and includes scientific or scholarly texts, texts from everyday life, and literary works. The digitalisation was made from the first edition of each work. Using the digital images of these editions, the text was first typed up manually twice (‘double keying’). To represent the structure of the text, the electronic full-text was encoded in conformity with the XML standard TEI P5. The next stages complete the linguistic analysis, i.e. the text is tokenised, lemmatised, and the parts of speech are annotated. The DTA thus presents a linguistically analysed, historical full-text corpus, available for a range of questions in corpus linguistics. Thanks to the interdisciplinary nature of the DTA Corpus, it also offers valuable source-texts for neighbouring disciplines in the humanities, and for scientists, legal scholars and economists.
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
The Text Laboratory provides assistance with databases, word lists, corpora and tailored solutions for language technology. We also work on research and development projects alone or in cooperation with others - locally, nationally and internationally. Services and tools: Word and frequency lists, Written corpora, Speech corpora, Multilingual corpora, Databases, Glossa Search Tool, The Oslo-Bergen Tagger, GREI grammar games, Audio files: dialects from Norway and America etc., Nordic Atlas of Language Structures (NALS) Journal, Norwegian in America, NEALT, Ethiopian Language Technology, Access to Corpora
CLARIN-LV is a national node of Clarin ERIC (Common Language Resources and Technology Infrastructure). The mission of the repository is to ensure the availability and long­ term preservation of language resources. The data stored in the repository are being actively used and cited in scientific publications.
Country
Arquivo.pt is a research infrastructure that preserves millions of files collected from the web since 1996 and provides a public search service over this information. It contains information in several languages. Periodically it collects and stores information published on the web. Then, it processes the collect data to make it searchable, providing a “Google-like” service that enables searching the past web (English user interface available at https://arquivo.pt/?l=en). This preservation workflow is performed through a large-scale distributed information system and can also accessed through API (https://arquivo.pt/api).
The CLARIN Centre at the University of Copenhagen, Denmark, hosts and manages a data repository (CLARIN-DK-UCPH Repository), which is part of a research infrastructure for humanities and social sciences financed by the University of Copenhagen. The CLARIN-DK-UCPH Repository provides easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form) and provides advanced tools for discovering, exploring, exploiting, annotating, and analyzing data. CLARIN-DK also shares knowledge on Danish language technology and resources and is the Danish node in the European Research Infrastructure Consortium, CLARIN ERIC.
Country
The program "Humanist Virtual Libraries" distributes heritage documents and pursues research associating skills in human sciences and computer science. It aggregates several types of digital documents: A selection of facsimiles of Renaissance works digitized in the Central Region and in partner institutions, the Epistemon Textual Database, which offers digital editions in XML-TEI, and Transcripts or analyzes of notarial minutes and manuscripts
Country
clarin:el is the Greek national network of language resources, a nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources. CLARIN EL infrastructure, which is a Greek nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources (LRs) and aims at increasing access to and augmentation of such resources at a national scale and beyond. It is an open, integrated, secure and interoperable storage, sharing and processing infrastructure for LRs (datasets, tools and processing services) for all domains domains and disciplines where language plays a critical role, notably. CLARIN EL is implemented in the framework of the CLARIN Attiki, national project in support of ESFRI/2006 Research Infrastructures.
CLARIN-UK is a consortium of centres of expertise involved in research and resource creation involving digital language data and tools. The consortium includes the national library, and academic departments and university centres in linguistics, languages, literature and computer science.
The focus of PolMine is on texts published by public institutions in Germany. Corpora of parliamentary protocols are at the heart of the project: Parliamentary proceedings are available for long stretches of time, cover a broad set of public policies and are in the public domain, making them a valuable text resource for political science. The project develops repositories of textual data in a sustainable fashion to suit the research needs of political science. Concerning data, the focus is on converting text issued by public institutions into a sustainable digital format (TEI/XML).
The repository is part of the National Research Data Infrastructure initiative Text+, in which the University of Tübingen is a partner. It is housed at the Department of General and Computational Linguistics. The infrastructure is maintained in close cooperation with the Digital Humanities Centre, which is a core facility of the university, colaborating with the library and computing center of the university. Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed. Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.
In collaboration with other centres in the Text+ consortium and in the CLARIN infrastructure, the CLARIND-UdS enables eHumanities by providing a service for hosting and processing language resources (notably corpora) for members of the research community. CLARIND-UdS centre thus contributes of lifting the fragmentation of language resources by assisting members of the research community in preparing language materials in such a way that easy discovery is ensured, interchange is facilitated and preservation is enabled by enriching such materials with meta-information, transforming them into sustainable formats and hosting them. We have an explicit mission to archive language resources especially multilingual corpora (parallel, comparable) and corpora including specific registers, both collected by associated researchers as well as researchers who are not affiliated with us.
CLAPOP is the portal of the Dutch CLARIN community. It brings together all relevant resources that were created within the CLARIN NL project and that now are part of the CLARIN NL infrastructure or that were created by other projects but are essential for the functioning of the CLARIN (NL) infrastructure. CLARIN-NL has closely cooperated with CLARIN Flanders in a number of projects. The common results of this cooperation and the results of this cooperation created by CLARIN Flanders are included here as well.
Currently the institute has more than 700 collections consisting of (digital) research data, digitized material, archival collections, printed material, handwritten questionnaires, maps and pictures. The focus is on resources relevant for the study of function, meaning and coherence of cultural expressions and resources relevant for the structural, dialectological and sociolinguistic study of language variation within the Dutch language. An overview is here https://meertens.knaw.nl/en/datasets/