Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 19 result(s)
ARCHE (A Resource Centre for the HumanitiEs) is a service aimed at offering stable and persistent hosting as well as dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields. ARCHE is the successor of the Language Resources Portal (LRP) and acts as Austria’s connection point to the European network of CLARIN Centres for language resources.
Country
The Australian National Corpus collates and provides access to assorted examples of Australian English text, transcriptions, audio and audio-visual materials. Text analysis tools are embedded in the interface allowing analysis and downloads in *.CSV format.
The Bavarian Archive for Speech Signals (BAS) is a public institution hosted by the University of Munich. This institution was founded with the aim of making corpora of current spoken German available to both the basic research and the speech technology communities via a maximally comprehensive digital speech-signal database. The speech material will be structured in a manner allowing flexible and precise access, with acoustic-phonetic and linguistic-phonetic evaluation forming an integral part of it.
Lithuania became a full member of CLARIN ERIC in January of 2015 and soon CLARIN-LT consortium was founded by three partner universities: Vytautas Magnus University, Kaunas Technology University and Vilnius University. The main goal of the consortium is to become a CLARIN B centre, which will be able to serve language users in Lithuania and Europe for storing and accessing language resources.
The Endangered Languages Archive (ELAR) is a digital repository for preserving multimedia collections of endangered languages from all over the world, making them available for future generations. In ELAR’s collections you can find recordings of every-day conversations, instructions on how to build fish traps or boats, explanations of kinship systems and the use of medicinal plants, and learn about art forms like string figures and sand drawings. ELAR’s collections are unique records of local knowledge systems encoded in their languages, described by the holders of the knowledge themselves.
The English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.
The repository of the Hamburg Centre for Speech Corpora is used for archiving, maintenance, distribution and development of spoken language corpora. These usually consist of audio and / or video recordings, transcriptions and other data and structured metadata. The corpora treat the focus on multilingualism and are generally freely available for research and teaching. Most of the measures maintained by the HZSK corpora were created in the years 2000-2011 in the framework of the SFB 538 "Multilingualism" at the University of Hamburg. The HZSK however also strives to take linguistic data from other projects or contexts, and to provide also the scientific community for research and teaching are available, provided that they are compatible with the current focus of HZSK, ie especially spoken language and multilingualism.
By stimulating inspiring research and producing innovative tools, Huygens ING intends to open up old and inaccessible sources, and to understand them better. Huygens ING’s focus is on Digital Humanities, History, History of Science, and Textual Scholarship. Huygens ING pursues research in the fields of History, Literary Studies, the History of Science and Digital Humanities. Huygens ING aims to publish digital sources and data responsibly and with care. Innovative tools are made as widely available as possible. We strive to share the available knowledge at the institute with both academic peers and the wider public.
The Language Archive Cologne (LAC) is a research data repository for the linguistics and all humanities disciplines working with audiovisual data. The archive forms a cluster of the Data Center for Humanities in cooperation with the Institute of Linguistics of the University of Cologne. The LAC is an archive for language resources, which is freely available via a web-based access. In addition, concrete technical and methodological advice is offered in the research data cycle - from the collection of the data, their preparation and archiving, to publication and reuse.
Country
Lithuanian Data Archive for Social Sciences and Humanities (LiDA) is a virtual digital infrastructure for SSH data and research resources acquisition, long-term preservation and dissemination. All the data and research resources are documented in both English and Lithuanian according to international standards. Access to the resources is provided via Dataverse repository. LiDA curates different types of resources and they are published into catalogues according to the type: Survey Data, Aggregated Data (including Historical Statistics), Encoded Data (including News Media Studies), and Textual Data. Also, LiDA holds collections of social sciences and humanities data deposited by Lithuanian science and higher education institutions and Lithuanian state institutions (Data of Other Institutions). LiDA is hosted by the Centre for Data Analysis and Archiving of Kaunas University of Technology (data.ktu.edu).
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. It was formed in 1992 to address the critical data shortage then facing language technology research and development. Initially, LDC's primary role was as a repository and distribution point for language resources. Since that time, and with the help of its members, LDC has grown into an organization that creates and distributes a wide array of language resources. LDC also supports sponsored research programs and language-based technology evaluations by providing resources and contributing organizational expertise. LDC is hosted by the University of Pennsylvania and is a center within the University’s School of Arts and Sciences.
As a member of SWE-CLARIN, the Humanities Lab will provide tools and expertise related to language archiving, corpus and (meta)data management, with a continued emphasis on multimodal corpora, many of which contain Swedish resources, but also other (often endangered) languages, multilingual or learner corpora. As a CLARIN K-centre we provide advice on multimodal and sensor-based methods, including EEG, eye-tracking, articulography, virtual reality, motion capture, av-recording. Current work targets automatic data retrieval from multimodal data sets, as well as the linking of measurement data (e.g. EEG, fMRI) or geo-demographic data (GIS, GPS) to language data (audio, video, text, annotations). We also provide assistance with speech and language technology related matters to various projects. A primary resource in the Lab is The Humanities Lab corpus server, containing a varied set of multimodal language corpora with standardised metadata and linked layers of annotations and other resources.
The Phonogrammarchiv is a multi-disciplinary research sound and video archive, covering holdings from all continents. Since its foundation in 1899 the Phonogrammarchiv has been building up its holdings by cooperating with Austrian scholars and archiving their collected material, or by fieldwork conducted by staff members on special topics exploring new fields of methods and contents. The main tasks comprise the production, annotation, cataloguing and long-term preservation of audio-visual field recordings, making the cultural heritage available for future generations and enabling the dissemination of the recordings as well as technical developments in the field of AV recording and storage. Thus the Phonogrammarchiv adds to infrastructural performance valuable to both the scholarly community and the public at large.
PORTULAN CLARIN Research Infrastructure for the Science and Technology of Language, belonging to the Portuguese National Roadmap of Research Infrastructures of Strategic Relevance, and part of the international research infrastructure CLARIN ERIC
Språkbanken was established in 1975 as a national center located in the Faculty of Arts, University of Gothenburg. Allén's groundbreaking corpus linguistic research resulted in the creation of one of the first large electronic text corpora in another language than English, with one million words of newspaper text. The task of Språkbanken is to collect, develop, and store (Swedish) text corpora, and to make linguistic data extracted from the corpora available to researchers and to the public.
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
Country
The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors (many of them the leading authorities on the subject).
The Tromsø Repository of Language and Linguistics (TROLLing) is a FAIR-aligned repository of linguistic data and statistical code. The archive is open access, which means that all information is available to everyone. All data are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant). Linguists worldwide are invited to deposit data and statistical code used in their linguistic research. TROLLing is a special collection within DataverseNO (http://doi.org/10.17616/R3TV17), and C Centre within CLARIN (Common Language Resources and Technology Infrastructure, a networked federation of European data repositories; http://www.clarin.eu/), and harvested by their Virtual Language Observatory (VLO; https://vlo.clarin.eu/).
Content type(s)
UK RED is a database documenting the history of reading in Britain from 1450 to 1945. Reading experiences of British subjects, both at home and abroad presented in UK RED are drawn from published and unpublished sources as diverse as diaries, commonplace books, memoirs, sociological surveys, and criminal court and prison records.