Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Data access

Data access restrictions

Database access

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 13 result(s)
Country
The Australian National Corpus collates and provides access to assorted examples of Australian English text, transcriptions, audio and audio-visual materials. Text analysis tools are embedded in the interface allowing analysis and downloads in *.CSV format.
Country
Created in 2005 by the CNRS, CNRTL unites in a single portal, a set of linguistic resources and tools for language processing. The CNRTL includes the identification, documentation (metadata), standardization, storage, enhancement and dissemination of resources. The sustainability of the service and the data is guaranteed by the backing of the UMR ATILF (CNRS - Université Nancy), support of the CNRS and its integration in the excellence equipment project ORTOLANG .
The knowledge centre is an information service offering advice on the use of digital language resources and tools for Swedish and other languages in Sweden, as well as other parts of the intangible cultural heritage of Sweden.
Codex Sinaiticus is one of the most important books in the world. Handwritten well over 1600 years ago, the manuscript contains the Christian Bible in Greek, including the oldest complete copy of the New Testament. The Codex Sinaiticus Project is an international collaboration to reunite the entire manuscript in digital form and make it accessible to a global audience for the first time. Drawing on the expertise of leading scholars, conservators and curators, the Project gives everyone the opportunity to connect directly with this famous manuscript.
The aim of the project is systematic mapping of Czech and other languages in comparison with Czech. CNC corpora are accessible to everybody interested in studying the language after free registration.
The English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.
Content type(s)
The vocabulary of forenames is a simple, multilingual vocabulary (i.e. without hierarchies etc.) in which the forenames of the project partners’ persons and the forenames’ spelling variants, both historical and dialectal, are documented with references or passages. As a rule, each forename is assigned one or more persons bearing that name. There is a hit list of the most frequent forenames between 200 BC and AD 2016 as well as a visualisation in word clouds and the occurrences in a timeline.
Content type(s)
A place of living memory, the Phonotheque of the MMSH aims to bring together recordings of the sound heritage that have the value of ethnological, linguistic, historical, musicological or literary information on the Mediterranean area. It documents fields little covered by conventional sources, or completes them with the point of view of actors or witnesses. The collection holds more than 8000 hours of audio archives recorded since the late 1950s concerning all the humanities sciences.
MICASE provides a collection of transcripts of academic speech events recorded at the University of Michigan. The original DAT audiotapes are held in the English Language Institute and may be consulted by bona fide researchers under special arrangements. Additional access: https://lsa.umich.edu/eli/language-resources/micase-micusp.html
Country
The speaking language atlas gives a multimedia impression of the dialects of the state Baden-Württemberg in Germany. The maps of the Speaking Language Atlas of Baden-Württemberg are based on two databases: Südwestdeutschen Sprachatlas (SSA) and the Sprachatlas von Nord Baden-Württemberg (SNBW). The dialect recordings that form the basis for the maps were carried out at the SSA between 1974 and 1986, but at the SNBW between 2009 and 2012. For the southern part, this means that the maps may present a state of affairs that is no longer valid today.
Country
The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors (many of them the leading authorities on the subject).
Welcome to the UCLA Phonetics Lab Archive. For over half a century, the UCLA Phonetics Laboratory has collected recordings of hundreds of languages from around the world, providing source materials for phonetic and phonological research, of value to scholars, speakers of the languages, and language learners alike. The materials on this site comprise audio recordings illustrating phonetic structures from over 200 languages with phonetic transcriptions, plus scans of original field notes where relevant.
Content type(s)
UK RED is a database documenting the history of reading in Britain from 1450 to 1945. Reading experiences of British subjects, both at home and abroad presented in UK RED are drawn from published and unpublished sources as diverse as diaries, commonplace books, memoirs, sociological surveys, and criminal court and prison records.