Learning Engineering Hub
Build your own learning engineering solutions
Build
These publicly available datasets, dataset collections, tools, libraries, and frameworks may help you get started conducting research and building your own learning engineering technology solutions. Do you think that a resource is missing? Suggest it to us.
Last Updated
Download Available?
Dataset | Organization(s) | Description | Location | Scale Of Study | Last Updated | Download Available? |
---|
DataShop@CMUA data repository and web application for learningscience researchers that provides secure data storage plus analysis and visualization tools.
Deep Mind Mathematical Dataset (Analysing Mathematical Reasoning Abilities of Neural Models) A total of 20 mathematical evaluation datasets, widely used in dozens of top artificial intelligence conferences such as ACL, AAAI, and ICLR since 2010 till now, have been collected.
E-TRIALS Datasets from ASSISTments
A collection of datasets related to grade school students’ interactions with the online learning math platform called ASSISTments.
Google Dataset SearchGoogle’s search engine for datasets.
Hugging FaceA collection of datasets, models, and more resources for developing AI models and conducting research.
ICPSRMaintains a data archive of more than 250,000 files of research in the social and behavioral sciences.
IPEDSA collection of data and general information on U.S. colleges, universities, and technical and vocational institutions.
KaggleContains over 50,000 public datasets and 400,000 public notebooks for data analysis.
LDbaseAn open science resource for the educational and developmental science scientific communities, providing a secure place to store and access data, as well access materials about aspects of data management and analyses.
LearnSphereA collection of tools, including data repositories, for learning research.
NCES International Data ExplorerA platform for exploring student and adult performance on international assessments.
OER HubA public digital library of open educational resources.
Our World in DataFree and open source charts and datasets on the world’s largest problems.
Papers with CodeA free and open resource with Machine Learning papers, code, datasets, methods and evaluation tables.
Roper CenterA repository of public opinion and survey data operated out of Cornell University.
Open Game DataAn open-source collection of educational game datasets.
Anaconda
Anaconda provides an open source
package
library and package management system for Python and R for scientific
computing, including data science, machine learning, data processing, and
predictive analytics.
CMU PLUSCMU PLUS tutor training lessons are freely available to all tutoring
organizations. They have made these public on their site at://tutors.plus.
CTAT
Carnegie Mellon University’s Cognitive Tutor Authoring Tools (CTAT) is a tool for educational researchers,
regardless of their coding expertise, to develop cognitive tutors that guide students through problems and offer
timely and relevant assistance.
DataShopDataShop is a data repository and web application that provides secure
data storage, analysis and visualization tools for learning science researchers. It aims to support scientific
discovery in education by offering extensive public and private datasets and tools for educational research.
Doccano
Doccano is an open-source platform for text annotation tool, providing annotation features for text
classification, sequence labeling, and sequence to sequence tasks.
GitHub
GitHub allows developers to collaboratively store, manage, track, and
control changes to their code.
Google
Colab Google Colab allows developers to write and execute Python code in a
browser, making sharing and collaborating easier, and access computing
resources such as GPUs and TPUs.
Hugging
FaceIn addition to over 93,000
datasets, Hugging Face is
an open source platform that allows access to open-source
machine learning collaboration on large language models, datasets, and other
applications.
LearnLab
Carnegie Mellon University’s LearnLab works to enhance the scientific understanding of effective learning in
educational contexts and develop a research infrastructure for field experimentation, data collection, and data
mining.
LearnSphereLearnSphere is a community software infrastructure that supports
sharing, analysis and collaboration across a wide variety of educational data. LearnSphere supports researchers
as they improve their understanding of human learning. It also helps course developers and instructors improve
teaching and learning through data-driven course redesign.
LKT: Logistic Knowledge TracingA tool for computing Logistic Knowledge Tracing
(LKT), a method to track learning in an educational software system.
LLMs-4-EDU Citation GroupAn open-access library of education-focused LLMs and research resources developed by John Whitmer. Once signed up, members can download, review, and contribute citations for peer-reviewed and pre-print publications. We are particularly interested in new research studies with outcome / impact evaluations of LLMs in applied settings. Please add new references to the “a – uncategorized” group, and they will be organized into the appropriate category.
MoFaCTSMoFaCTS is an educational platform that supports data-driven learning
through content modules, user management, and reporting features for admins and teachers.
ParlAI
ParlAI offers popular datasets, reference models, and integration of Amazon
Mechanical Turk to share, train, and evaluate dialogue models across tasks
such as open-domain chat, task-oriented dialogue, and visual question
answering.
Penn Center
for Learning AnalyticsPenn Center for Learning Analytics offers a collection of open-source
tools and frameworks related to educational data.
PyTorchPyTorch is an end-to-end machine learning framework that facilitates
experimentation and production through a C++ front-end platform, distributed
training, and resources ecosystem.
Ryan Baker’s Educational Tools and FrameworksThis page provides a collection of
open-source educational tools and frameworks developed by Ryan Baker and colleagues.
Scikit-learnScikit-learn is an open source machine learning library for
Python. The
library includes tools for foundational ML practices such as model fitting,
data preprocessing, model selection, and model evaluation.
TensorFlowTensorFlow is an end-to-end machine learning platform that shares
tools
for preparing data, building and deploying models, and implementing MLOps.
TigrisTigris is a workflow authoring tool that is part of the community software
infrastructure being built for the LearnSphere project. The platform facilitates the creation and sharing of
custom analyses, as well as interactions with external repositories, such as DataShop, MOOCdb, DiscourseDB, and
DataStage.
Torus
Building on the work of Carnegie Mellon University’s Open Learning Initiative, Torus is an open platform
allowing users to author, deliver, improve, and research learning experiences.
The UDL
Guidelines The Universal Design for Learning (UDL) Guidelines are a tool within the UDL
framework, aimed at enhancing teaching and learning. This tool benefits educators, curriculum developers, and
researchers to provide accessible, engaging, and challenging learning opportunities for all learners.
Unzin’s
Data Platform The Unizin Data Platform (UDP) integrates, normalizes, and warehouses educational
data from diverse sources like LMS, SIS, and LTI tools, enabling higher education institutions to effectively
utilize learning analytics and data for student success initiatives.
WikiData for Education
Wikidata for Education is an initiative that aims to align open educational resources with local, national, and
international curriculum frameworks to support teachers and students in achieving their educational goals.
WolframWolfram offers free public resources in advanced computation,
includingWolframAlpha, a platform that
introduces knowledge-based computing through algorithms
and AI in Math, Science, and Technology.
Learning Engineering Hub
Contact Us For More Information