Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 17 result(s)
LAUDATIO has developed an open access research data repository for historical corpora. For the access and (re-)use of historical corpora, the LAUDATIO repository uses a flexible and appropriate documentation schema with a subset of TEI customized by TEI ODD. The extensive metadata schema contains information about the preparation and checking methods applied to the data, tools, formats and annotation guidelines used in the project, as well as bibliographic metadata, and information on the research context (e.g. the research project). To provide complex and comprehensive search in the annotation data, the search and visualization tool ANNIS is integrated in the LAUDATIO-Repository.
<<<!!!<<< This repository is no longer available. >>>!!!>>> see https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=7021#!/details and https://ota.bodleian.ox.ac.uk/repository/xmlui/discover?query=germanc&submit=Search&filtertype_1=title&filter_relational_operator_1=contains&filter_1=&query=germanc
Country
The Diccionario del Español Medieval electrónico (DEMel) provides a lemmatised and semantically pre-structured electronic data archive on medieval Spanish. The basis of the data collection is the digitised material archived in card indexes of the Diccionario del Español Medieval (DEM), which was discontinued at the end of 2007 for financial reasons. The DEM covers the history of the development of Spanish vocabulary between the 10th and the beginning of the 15th century on the basis of far more than 600 literary and non-literary works or collections of texts and documents.
The Manchester Romani Project is part of an international network of scholarly projects devoted to research on Romani language and linguistics, coordinated in partnership with Dieter Halwachs (Institute of Linguistics, Graz University and Romani-Projekt Graz), and Peter Bakker (Institute of Linguistics, Aarhus University). The project explores the linguistic features of the dialects of the Romani language, and their distribution in geographical space. An interactive web application is being designed, which will allow users to search and locate on a map different dialectal variants, and to explore how variants cluster in particular regions. Examples sentences and words with sound files will also be made available, to give impressions of dialectal variation within Romani. From the distribution of linguistic forms among the dialects it will be possible to make infeences about social-historical contacts among the Romani communities, and about migration patterns.
Content type(s)
A place of living memory, the Phonotheque of the MMSH aims to bring together recordings of the sound heritage that have the value of ethnological, linguistic, historical, musicological or literary information on the Mediterranean area. It documents fields little covered by conventional sources, or completes them with the point of view of actors or witnesses. The collection holds more than 8000 hours of audio archives recorded since the late 1950s concerning all the humanities sciences.
Content type(s)
The vocabulary of forenames is a simple, multilingual vocabulary (i.e. without hierarchies etc.) in which the forenames of the project partners’ persons and the forenames’ spelling variants, both historical and dialectal, are documented with references or passages. As a rule, each forename is assigned one or more persons bearing that name. There is a hit list of the most frequent forenames between 200 BC and AD 2016 as well as a visualisation in word clouds and the occurrences in a timeline.
The Endangered Languages Archive (ELAR) is a digital repository for preserving multimedia collections of endangered languages from all over the world, making them available for future generations. In ELAR’s collections you can find recordings of every-day conversations, instructions on how to build fish traps or boats, explanations of kinship systems and the use of medicinal plants, and learn about art forms like string figures and sand drawings. ELAR’s collections are unique records of local knowledge systems encoded in their languages, described by the holders of the knowledge themselves.
Content type(s)
The Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) is a CLARIN partner institution and has been an officially certified CLARIN service center since June 20th, 2013. The CLARIN center at the BBAW focuses on historical text corpora (predominantly provided by the 'Deutsches Textarchiv'/German Text Archive, DTA) as well as on lexical resources (e.g. dictionaries provided by the 'Digitales Wörterbuch der Deutschen Sprache'/Digital Dictionary of the German Language, DWDS).
SWE-CLARIN is a national node in European Language and Technology Infrastructure (CLARIN) - an ESFRI initiative to build an infrastructure for e-science in the humanities and social sciences. SWE-CLARIN makes language-based materials available as research data using advanced processing tools and other resources. One basic idea is that the increasing amount of text and speech - contemporary and historical - as digital research material enables new forms of e-science and new ways to tackle old research issues.
The Language Archive at the Max Planck Institute in Nijmegen provides a unique record of how people around the world use language in everyday life. It focuses on collecting spoken and signed language materials in audio and video form along with transcriptions, analyses, annotations and other types of relevant material (e.g. photos, accompanying notes).
Country
The speaking language atlas gives a multimedia impression of the dialects of the state Baden-Württemberg in Germany. The maps of the Speaking Language Atlas of Baden-Württemberg are based on two databases: Südwestdeutschen Sprachatlas (SSA) and the Sprachatlas von Nord Baden-Württemberg (SNBW). The dialect recordings that form the basis for the maps were carried out at the SSA between 1974 and 1986, but at the SNBW between 2009 and 2012. For the southern part, this means that the maps may present a state of affairs that is no longer valid today.
Country
PARADISEC (the Pacific And Regional Archive for Digital Sources in Endangered Cultures) offers a facility for digital conservation and access to endangered materials from all over the world. Our research group has developed models to ensure that the archive can provide access to interested communities, and conforms with emerging international standards for digital archiving. We have established a framework for accessioning, cataloguing and digitising audio, text and visual material, and preserving digital copies. The primary focus of this initial stage is safe preservation of material that would otherwise be lost, especially field tapes from the 1950s and 1960s.
CLARIN.SI is the Slovenian node of the European CLARIN (Common Language Resources and Technology Infrastructure) Centers. The CLARIN.SI repository is hosted at the Jožef Stefan Institute and offers long-term preservation of deposited linguistic resources, along with their descriptive metadata. The integration of the repository with the CLARIN infrastructure gives the deposited resources wide exposure, so that they can be known, used and further developed beyond the lifetime of the projects in which they were produced. Among the resources currently available in the CLARIN.SI repository are the multilingual MULTEXT-East resources, the CC version of Slovenian reference corpus Gigafida, the morphological lexicon Sloleks, the IMP corpora and lexicons of historical Slovenian, as well as many other resources for a variety of languages. Furthermore, several REST-based web services are provided for different corpus-linguistic and NLP tasks.
ANPERSANA is the digital library of IKER (UMR 5478), a research centre specialized in Basque language and texts. The online library platform receives and disseminates primary sources of data issued from research in Basque language and culture. As of today, two corpora of documents have been published. The first one, is a collection of private letters written in an 18th century variety of Basque, documented in and transcribed to modern standard Basque. The discovery of the collection, named Le Dauphin, has enabled the emerging of new questions about the history and sociology of writing in the domain of minority languages, not only in France, but also among the whole Atlantic Arc. The second of the two corpora is a selection of sound recordings about monodic chant in the Basque Country. The documents were collected as part of a PhD thesis research work that took place between 2003 and 2012. It's a total of 50 hours of interviews with francophone and bascophone cultural representatives carried out at either their workplace of the informers or in public areas. ANPERSANA is bundled with an advanced search engine. The documents have been indexed and geo-localized on an interactive map. The platform is engaged with open access and all the resources can be uploaded freely under the different Creative Commons (CC) licenses.
The German Text Archive (Deutsches Textarchiv, DTA) presents online a selection of key German-language works in various disciplines from the 17th to 19th centuries. The electronic full-texts are indexed linguistically and the search facilities tolerate a range of spelling variants. The DTA presents German-language printed works from around 1650 to 1900 as full text and as digital facsimile. The selection of texts was made on the basis of lexicographical criteria and includes scientific or scholarly texts, texts from everyday life, and literary works. The digitalisation was made from the first edition of each work. Using the digital images of these editions, the text was first typed up manually twice (‘double keying’). To represent the structure of the text, the electronic full-text was encoded in conformity with the XML standard TEI P5. The next stages complete the linguistic analysis, i.e. the text is tokenised, lemmatised, and the parts of speech are annotated. The DTA thus presents a linguistically analysed, historical full-text corpus, available for a range of questions in corpus linguistics. Thanks to the interdisciplinary nature of the DTA Corpus, it also offers valuable source-texts for neighbouring disciplines in the humanities, and for scientists, legal scholars and economists.
Country
Lithuanian Data Archive for Social Sciences and Humanities (LiDA) is a virtual digital infrastructure for SSH data and research resources acquisition, long-term preservation and dissemination. All the data and research resources are documented in both English and Lithuanian according to international standards. Access to the resources is provided via Dataverse repository. LiDA curates different types of resources and they are published into catalogues according to the type: Survey Data, Aggregated Data (including Historical Statistics), Encoded Data (including News Media Studies), and Textual Data. Also, LiDA holds collections of social sciences and humanities data deposited by Lithuanian science and higher education institutions and Lithuanian state institutions (Data of Other Institutions). LiDA is hosted by the Centre for Data Analysis and Archiving of Kaunas University of Technology (data.ktu.edu).