Skip to content
The Learning Agency
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Our Programs
    • Case Studies
    • Guides & Reports
    • Newsroom
  • The Cutting Ed
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Our Programs
    • Case Studies
    • Guides & Reports
    • Newsroom
  • The Cutting Ed
The Learning Agency
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Our Programs
    • Case Studies
    • Guides & Reports
    • Newsroom
  • The Cutting Ed
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Our Programs
    • Case Studies
    • Guides & Reports
    • Newsroom
  • The Cutting Ed

Building Smarter Feedback: How Algorithms Can Advance The Platforms Teachers Depend On

The Cutting Ed
  • December 12, 2025
Jules King, Kennedy Smith

Teachers today are navigating choppy waters. With large class sizes, ongoing teacher shortages, and limited time, providing personalized feedback to every student has become a difficult task. However, AI-driven tools are beginning to ease this strain. These tools, powered by machine learning, large language models, and supporting datasets, help educators deliver feedback more efficiently, consistently and effectively.

For years, The Learning Agency has been invested in this effort. Through 10 Kaggle competitions, the organization has created open-source datasets and algorithms that advance both education research and classroom practice. Two of these competitions, the MAP – Charting Student Math Misunderstandings (MAP) competition and the Robust Algorithms for Thorough Essay Rating (RATER) competition, demonstrate the power of data-driven innovation to transform feedback, reveal student thinking, and expand teachers capacity to support learning.

The Learning Agency’s Mission

The Learning Agency is dedicated to pushing educational innovation forward to help solve some of the most persistent challenges in the field. Already a range of AI-supported platforms are helping educators diagnose math errors, provide essay feedback and guide students through the learning process. But better tools depend on better data and better models.

Data science competitions offer a unique way to accelerate this progress. By inviting researchers, developers, and interdisciplinary teams to solve real educational problems, these competitions generate high-quality datasets and spark new insights that platform developers can adopt and build on. The MAP and RATER competitions illustrate how this collaborative approach translates into practical breakthroughs that can benefit classrooms.

MAP - Charting Student Math Misunderstandings

The MAP competition was designed to advance methods for identifying and categorizing math errors among students in grades 4 through 8. The goal was not simply to determine whether an answer was right or wrong, but to understand the thinking behind a student’s response. Misconceptions in math can take root early and, if left unaddressed, may compound into major barriers later on. By surfacing patterns in student reasoning, MAP aimed to help educators intervene earlier and more effectively.

To accomplish this, the competition used a rich dataset of student responses from Eedi, a UK-based learning platform. On the Eedi platform, students provided answers to multiple choice questions each containing one correct answer and three carefully designed distractors, along with an optional written explanation. Annotators labeled each response according to a misconception taxonomy grounded in well-established research on math cognition. The taxonomy served as a roadmap of common misunderstandings, capturing not only procedural slips but also deeper conceptual struggles.

Participation in the competition was remarkably strong. More than 1,850 teams submitted almost 40,000 entries, experimenting with a wide range of modeling approaches. The most effective solutions combined ensembles of large language models with creative strategies, such as training models to predict sub-problems or averaging outputs from multiple models. A detailed report on these strategies is available here.

This competition demonstrated a strong proof of concept that tools and platforms could build highly accurate and effective AI models capable of identifying math misconceptions This can lead to more targeted feedback and early interventions that can prevent misconceptions from becoming entrenched.

The impact of MAP extends far beyond the competition itself. This competition demonstrated a strong proof of concept that tools and platforms could build highly accurate and effective AI models capable of identifying math misconceptions This can lead to more targeted feedback and early interventions that can prevent misconceptions from becoming entrenched. The MAP dataset and models are open-source, offering researchers and developers the opportunity to build off of this work, test interventions, research questions, and track the effect of personalized feedback. As the field moves forward, expanding the taxonomy beyond grades four through eight could create an even more powerful framework. A broader set of error types, applied across more grade levels, could eventually inform teacher training and enhance the design of math instructional materials.

The RATER Competition

The Robust Algorithms for Thorough Essay Rating (RATER) competition built on two previous Feedback Prize challenges and aimed to create a unified, efficient, and fair model for evaluating student writing across multiple dimensions. Writing instruction is a particularly time-intensive endeavor, and teachers often struggle to provide the detailed feedback students need to improve. 

The competition used the PERSUADE dataset, a collection of 25,000 argumentative essays written by U.S. middle and high school students. The dataset was noteworthy both for its scale and diversity, containing writing from students with varied linguistic and socioeconomic backgrounds. Essays were labeled for discourse elements such as claims, evidence, and position, as well as for the effectiveness of those elements and for overall writing quality.

Five expert teams, including Kaggle Masters, Grandmasters, and leading NLP researchers, participated under the guidance of a technical consulting team to establish strong benchmarks. The top-performing models reached human-level accuracy with great efficiency and had a simple user-friendly implementation. One leading model achieved more than 72 percent accuracy in identifying discourse types and over 86 percent accuracy in labeling effectiveness. Importantly, these models demonstrated minimal bias across race, ethnicity, and English-learner status.

With AI shouldering part of the feedback load, educators gained valuable time to focus on instruction and mentorship. Rather than replacing teachers, RATER functioned as a pedagogical partner, one that enhanced formative assessment, informed lesson planning, and supported richer classroom discussions about writing.

The algorithms developed through RATER were implemented in three platforms, TeeRead, ThinkCERCA, and PapyrusAI, which collectively served more than 400,00 students by the end of 2025. Across these platforms, similar patterns emerged. Students revised their writing more frequently and more thoughtfully. They strengthened their claims, improved their use of evidence, and became more reflective writers. Feedback delivered through RATER-powered tools was highly specific, helping students understand not only what needed improvement but also how to improve it.

Teachers benefited as well. With AI shouldering part of the feedback load, educators gained valuable time to focus on instruction and mentorship. Rather than replacing teachers, RATER functioned as a pedagogical partner, one that enhanced formative assessment, informed lesson planning, and supported richer classroom discussions about writing. The models and dataset are publicly available, providing opportunities for further research, development, and integration into new educational tools.

Future Directions

There are many opportunities to extend the work that MAP and RATER have begun. As previously mentioned, expanding the MAP taxonomy beyond middle-grade math could create a more comprehensive understanding of misconceptions across K-12. A richer taxonomy could also be integrated into teacher training, helping educators recognize and respond to common challenges more effectively. Similarly, RATER could be extended to additional writing genres such as expository or narrative writing, allowing AI to provide more precise and meaningful feedback across different forms of student expression. Continued collaboration among educators, researchers, and data scientists will be essential to ensuring that AI-driven solutions remain practical, equitable, and impactful.

Building Supportive Classrooms Through AI Innovation

The pressures facing teachers today make personalized feedback difficult, but MAP and RATER show how AI can help fill that gap. By creating open datasets, new taxonomies, and high-performing models, these competitions can enhance tools that lighten teacher workload while strengthening student learning. Their impact demonstrates that when educators and researchers collaborate, technology can meaningfully support the classroom, reducing strain, increasing equity, and helping teachers focus on supporting their students.

Jules King

Program Manager

Kennedy Smith

Program Associate

Twitter Linkedin
Previous Post
Next Post

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Contact Us

General Inquiries

info@the-learning-agency.com

Media Inquiries

press@the-learning-agency.com

X-twitter Linkedin

Mailing address

The Learning Agency

700 12th St N.W

Suite 700 PMB 93369

Washington, DC 20002

Stay up-to-date by signing up for our weekly newsletter

© Copyright 2025. The Learning Agency. All Rights Reserved | Privacy Policy

Stay up-to-date by signing up for our weekly newsletter