Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
Found 37 result(s)
The Alaska Native Language Archive houses documentation of the various Native languages of Alaska and helps to preserve and cultivate this unique heritage for future generations. As the premier repository worldwide for information relating to the Native languages of Alaska, the Archive serves researchers, teachers and students, as well as members of the broader community. The collection includes both published and unpublished materials in or on all of the Alaska Native languages and related languages. The collection has enduring cultural, historic, and intellectual value, particularly for Alaska Native language speakers and their descendants
ILC-CNR for CLARIN-IT repository is a library for linguistic data and tools. Including: Text Processing and Computational Philology; Natural Language Processing and Knowledge Extraction; Resources, Standards and Infrastructures; Computational Models of Language Usage. The studies carried out within each area are highly interdisciplinary and involve different professional skills and expertises that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.
The Tromsø Repository of Language and Linguistics (TROLLing) is a FAIR-aligned repository of linguistic data and statistical code. The archive is open access, which means that all information is available to everyone. All data are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant). Linguists worldwide are invited to deposit data and statistical code used in their linguistic research. TROLLing is a special collection within DataverseNO (http://doi.org/10.17616/R3TV17), and C Centre within CLARIN (Common Language Resources and Technology Infrastructure, a networked federation of European data repositories; http://www.clarin.eu/), and harvested by their Virtual Language Observatory (VLO; https://vlo.clarin.eu/).
Country
The Informatics Research Data Repository is a Japanese data repository that collects data on disciplines within informatics. Such sub-categories are things like consumerism and information diffusion. The primary data within these data sets is from experiments run by IDR on how one group is linked to another.
Country
DAIS - Digital Archive of the Serbian Academy of Sciences and Arts is a joint digital repository of the Serbian Academy of Sciences and Arts (SASA) and the research institutes under the auspices of SASA. The aim of the repository is to provide open access to publications and other research outputs resulting from the projects implemented by the SASA and its institutes. The repository uses a DSpace-based software platform developed and maintained by the Belgrade University Computer Centre (RCUB).
The project is set up in order to improve the infrastructure for text-based linguistic research and development by building a huge, automatically annotated German text corpus and the corresponding tools for corpus annotation and exploitation. DeReKo constitutes the largest linguistically motivated collection of contemporary German texts, contains fictional, scientific and newspaper texts, as well as several other text types, contains only licenced texts, is encoded with rich meta-textual information, is fully annotated morphosyntactically (three concurrent annotations), is continually expanded, with a focus on size and stratification of data, may be analyzed free of charge via the query system COSMAS II, serves as a 'primordial sample' from which users may draw specialized sub-samples (socalled 'virtual corpora') to represent the language domain they wish to investigate. !!! Access to data of Das Deutsche Referenzkorpus is also provided by: IDS Repository https://www.re3data.org/repository/r3d100010382 !!!
CLARIN.SI is the Slovenian node of the European CLARIN (Common Language Resources and Technology Infrastructure) Centers. The CLARIN.SI repository is hosted at the Jožef Stefan Institute and offers long-term preservation of deposited linguistic resources, along with their descriptive metadata. The integration of the repository with the CLARIN infrastructure gives the deposited resources wide exposure, so that they can be known, used and further developed beyond the lifetime of the projects in which they were produced. Among the resources currently available in the CLARIN.SI repository are the multilingual MULTEXT-East resources, the CC version of Slovenian reference corpus Gigafida, the morphological lexicon Sloleks, the IMP corpora and lexicons of historical Slovenian, as well as many other resources for a variety of languages. Furthermore, several REST-based web services are provided for different corpus-linguistic and NLP tasks.
The repository is part of the National Research Data Infrastructure initiative Text+, in which the University of Tübingen is a partner. It is housed at the Department of General and Computational Linguistics. The infrastructure is maintained in close cooperation with the Digital Humanities Centre, which is a core facility of the university, colaborating with the library and computing center of the university. Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed. Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.
Currently the institute has more than 700 collections consisting of (digital) research data, digitized material, archival collections, printed material, handwritten questionnaires, maps and pictures. The focus is on resources relevant for the study of function, meaning and coherence of cultural expressions and resources relevant for the structural, dialectological and sociolinguistic study of language variation within the Dutch language. An overview is here https://meertens.knaw.nl/en/datasets/