The Language Bank features text and speech corpora with different kinds of annotations in over 60 languages. There is also a selection of tools for working with them, from linguistic analyzers to programming environments. Corpora are also available via web interfaces, and users can be allowed to download some of them. The IP holders can monitor the use of their resources and view user statistics.
The Sloan Digital Sky Survey (SDSS) is one of the most ambitious and influential surveys in the history of astronomy. Over eight years of operations (SDSS-I, 2000-2005; SDSS-II, 2005-2008; SDSS-III 2008-2014; SDSS-IV 2013 ongoing), it obtained deep, multi-color images covering more than a quarter of the sky and created 3-dimensional maps containing more than 930,000 galaxies and more than 120,000 quasars. DSS-IV is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration including the Carnegie Institution for Science, Carnegie Mellon University, the Chilean Participation Group, Harvard-Smithsonian Center for Astrophysics, Instituto de Astrofísica de Canarias, The Johns Hopkins University, Kavli Institute for the Physics and Mathematics of the Universe (IPMU) / University of Tokyo, Lawrence Berkeley National Laboratory, Leibniz Institut für Astrophysik Potsdam (AIP), Max-Planck-Institut für Astrophysik (MPA Garching), Max-Planck-Institut für Extraterrestrische Physik (MPE), Max-Planck-Institut für Astronomie (MPIA Heidelberg), National Astronomical Observatory of China, New Mexico State University, New York University, The Ohio State University, Pennsylvania State University, Shanghai Astronomical Observatory, United Kingdom Participation Group, Universidad Nacional Autónoma de México, University of Arizona, University of Colorado Boulder, University of Portsmouth, University of Utah, University of Washington, University of Wisconsin, Vanderbilt University, and Yale University.
US National Science Foundation (NSF) facility to support drilling and coring in continental locations worldwide. Drill core metadata and data, borehole survey data, geophysical site survey data, drilling metadata, software code. CSDCO offers several repositories with samples, data, publications and reference collections about drilling and coring: LacCore Core Repository, Open Core Data, Index to Marine and Lacustrine Geological Samples. For " Botanical Reference Collections" contact the LacCore Curator for details.
The Research Data Center (RDC) “International Survey Programs“ provides researchers with data, services, and consultation on a number of important international study series which are under intensive curation by GESIS. They all cover numerous countries and, quite often, substantial time spans. The RDC provides optimal data preparation and access to a wide scope of data and topics for comparative analysis.
Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.
The KNMI Data Centre (KDC) provides access to weather, climate and seismological datasets of KNMI. For each dataset descriptive information is available (metadata), including a point of contact. The KNMI Data Centre (KDC) provides access to KNMI data on weather, climate and seismology. You will find KNMI data on various topics such as: the most recent 10 minutes of observations, historical data, data on meteorological stations, modeling, earthquake data and satellite products.
The CliSAP-Integrated Climate Data Center (ICDC) allows easy access to climate relevant data from in-situ measurements and satellite remote sensing. These data are important to determine the status and the changes in the climate system. Additionally some relevant re-analysis data are included, which are modeled on the basis of observational data.
This DOI repository provides permanent identifiers to data sets generated by Life Science researchers active in Sweden, and for which no other suitable public repository is available. BILS is a distributed national research infrastructure supported by the Swedish Research Council (Vetenskapsrådet) providing bioinformatics support to life science researchers in Sweden.
RWTH Publications Research Data offers all RWTH Aachen University affiliates the organizational and technical means to electronically document and publish research data at this institutional repository. Certainly, researchers are encouraged to prefer a subject specific repository whenever appropriate and available. RWTH Aachen University is the largest technical university in Germany and one of nine 'German Universities of Excellence' (elite university). The University library Aachen operates the repository as a member of the join community.
The primary objective of the PhenoCam project is to use automated, near-surface remote sensing to provide continuous, real-time monitoring of vegetation phenology across a range of ecosystems and climate zones.
LacCore curates cores and samples from continental coring and drilling expeditions around the world, and also archives metadata and contact information for cores stored at other institutions.
NSIDC offers hundreds of scientific data sets for research, focusing on the cryosphere and its interactions. Data are from satellites and field observations. All data are free of charge.
The NBN Atlas is a collaborative project that aggregates biodiversity data from multiple sources and makes it available and usable online. It is the UK’s largest collection of freely available biodiversity data.
OpenML is an open ecosystem for machine learning. By organizing all resources and results online, research becomes more efficient, useful and fun. OpenML is a platform to share detailed experimental results with the community at large and organize them for future reuse. Moreover, it will be directly integrated in today’s most popular data mining tools (for now: R, KNIME, RapidMiner and WEKA). Such an easy and free exchange of experiments has tremendous potential to speed up machine learning research, to engender larger, more detailed studies and to offer accurate advice to practitioners. Finally, it will also be a valuable resource for education in machine learning and data mining.
The Colombian Biodiversity Information Facility (SiB Colombia) is a national initiative established in early 2000 and coordinated by Instituto Humboldt to facilitate free and open access to biodiversity data. It comprises a network of more than 100 organizations (including universities, biological collections, research institutes, environmental authorities and NGOs among others) that work together to ensure that biodiversity data is available to support further research, education, policy making and incentive measures for the conservation and sustainable use of biodiversity. SiB Colombia’s mission is to facilitate the management of biodiversity data by bringing together users, publishers and data producers to support research, education and decision making related to knowledge, conservation and sustainable use of biodiversity and ecosystem services. SiB Colombia aims to consolidate the collaborative platform that facilitates the generation, use and democratization of knowledge on the biodiversity of Colombia. Thus, SiB Colombia contributes to a vision of a society that knows and values the biodiversity in which it is immersed, and uses such knowledge for its development.
Blackfynn Discover is a repository for Neurology and Neuroscience datasets. This repository, funded by DARPA, the NIH, and others, provides a user-friendly solution for publishing large, complex datasets is a scalable and sustainable way. The platform aims to make data available in a meaningful way and to drive adoption of cloud-based analysis over large datasets.