Filter
Reset all

Subjects

Content Types

Countries

AID systems

API

Certificates

Data access

Data access restrictions

Database access

Database access restrictions

Database licenses

Data licenses

Data upload

Data upload restrictions

Enhanced publication

Institution responsibility type

Institution type

Keywords

Metadata standards

PID systems

Provider types

Quality management

Repository languages

Software

Syndications

Repository types

Versioning

  • * at the end of a keyword allows wildcard searches
  • " quotes can be used for searching phrases
  • + represents an AND search (default)
  • | represents an OR search
  • - represents a NOT operation
  • ( and ) implies priority
  • ~N after a word specifies the desired edit distance (fuzziness)
  • ~N after a phrase specifies the desired slop amount
  • 1 (current)
Found 19 result(s)
The Buckeye Corpus of conversational speech contains high-quality recordings from 40 speakers in Columbus OH conversing freely with an interviewer. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is currently being written.
CHILDES is the child language component of the TalkBank system. TalkBank is a system for sharing and studying conversational interactions.
OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. The OLAC system has 2016 been integrated with the Linguistic Linked Open Data Cloud.
D-PLACE contains cultural, linguistic, environmental and geographic information for over 1400 human ‘societies’. A ‘society’ in D-PLACE represents a group of people in a particular locality, who often share a language and cultural identity. All cultural descriptions are tagged with the date to which they refer and with the ethnographic sources that provided the descriptions. The majority of the cultural descriptions in D-PLACE are based on ethnographic work carried out in the 19th and early-20th centuries (pre-1950).
The University of Pittsburgh English Language Institute Corpus (PELIC) is a 4.2-million-word learner corpus of written texts. These texts were collected in an English for Academic Purposes (EAP) context over seven years in the University of Pittsburgh’s Intensive English Program, and were produced by over 1100 students with a wide range of linguistic backgrounds and proficiency levels. PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting.
The Polinsky Language Sciences Lab at Harvard University is a linguistics lab that examines questions of language structure and its effect on the ways in which people use and process language in real time. We engage in linguistic and interdisciplinary research projects ourselves; offer linguistic research capabilities for undergraduate and graduate students, faculty, and visitors; and build relationships with the linguistic communities in which we do our research. We are interested in a broad range of issues pertaining to syntax, interfaces, and cross-linguistic variation. We place a particular emphasis on novel experimental evidence that facilitates the construction of linguistic theory. We have a strong cross-linguistic focus, drawing upon English, Russian, Chinese, Korean, Mayan languages, Basque, Austronesian languages, languages of the Caucasus, and others. We believe that challenging existing theories with data from as broad a range of languages as possible is a crucial component of the successful development of linguistic theory. We investigate both fluent speakers and heritage speakers—those who grew up hearing or speaking a particular language but who are now more fluent in a different, societally dominant language. Heritage languages, a novel field of linguistic inquiry, are important because they provide new insights into processes of linguistic development and attrition in general, thus increasing our understanding of the human capacity to maintain and acquire language. Understanding language use and processing in real time and how children acquire language helps us improve language study and pedagogy, which in turn improves communication across the globe. Although our lab does not specialize in language acquisition, we have conducted some studies of acquisition of lesser-studied languages and heritage languages, with the purpose of comparing heritage speakers to adults.
MICASE provides a collection of transcripts of academic speech events recorded at the University of Michigan. The original DAT audiotapes are held in the English Language Institute and may be consulted by bona fide researchers under special arrangements. Additional access: https://lsa.umich.edu/eli/language-resources/micase-micusp.html
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. It was formed in 1992 to address the critical data shortage then facing language technology research and development. Initially, LDC's primary role was as a repository and distribution point for language resources. Since that time, and with the help of its members, LDC has grown into an organization that creates and distributes a wide array of language resources. LDC also supports sponsored research programs and language-based technology evaluations by providing resources and contributing organizational expertise. LDC is hosted by the University of Pennsylvania and is a center within the University’s School of Arts and Sciences.
Additionally to the institutional repository, current St. Edward's faculty have the option of uploading their work directly to their own SEU accounts on stedwards.figshare.com. Projects created on Figshare will automatically be published on this website as well. For more information, please see documentation
>>>>>!!!<<<<< As of 01/12/2015, deposit of data on SLDR website will be suspended to allow the public opening of Ortolang platform https://www.ortolang.fr/#/market/home .>>>>>!!!<<<<<
The English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.
The Alaska Native Language Archive houses documentation of the various Native languages of Alaska and helps to preserve and cultivate this unique heritage for future generations. As the premier repository worldwide for information relating to the Native languages of Alaska, the Archive serves researchers, teachers and students, as well as members of the broader community. The collection includes both published and unpublished materials in or on all of the Alaska Native languages and related languages. The collection has enduring cultural, historic, and intellectual value, particularly for Alaska Native language speakers and their descendants
The Manchester Romani Project is part of an international network of scholarly projects devoted to research on Romani language and linguistics, coordinated in partnership with Dieter Halwachs (Institute of Linguistics, Graz University and Romani-Projekt Graz), and Peter Bakker (Institute of Linguistics, Aarhus University). The project explores the linguistic features of the dialects of the Romani language, and their distribution in geographical space. An interactive web application is being designed, which will allow users to search and locate on a map different dialectal variants, and to explore how variants cluster in particular regions. Examples sentences and words with sound files will also be made available, to give impressions of dialectal variation within Romani. From the distribution of linguistic forms among the dialects it will be possible to make infeences about social-historical contacts among the Romani communities, and about migration patterns.
Sinmin contains texts of different genres and styles of the modern and old Sinhala language. The main sources of electronic copies of texts for the corpus are online Sinhala newspapers, online Sinhala news sites, Sinhala school textbooks available in online, online Sinhala magazines, Sinhala Wikipedia, Sinhala fictions available in online, Mahawansa, Sinhala Blogs, Sinhala subtitles and Sri lankan gazette.
Welcome to the UCLA Phonetics Lab Archive. For over half a century, the UCLA Phonetics Laboratory has collected recordings of hundreds of languages from around the world, providing source materials for phonetic and phonological research, of value to scholars, speakers of the languages, and language learners alike. The materials on this site comprise audio recordings illustrating phonetic structures from over 200 languages with phonetic transcriptions, plus scans of original field notes where relevant.
The Digital South Asia Library provides digital materials for reference and research on South Asia to scholars, public officials, business leaders, and other users. This program builds upon a two-year pilot project funded by the Association of Research Libraries' Global Resources Program with support from the Andrew W. Mellon Foundation.