RATER Challenge Details

Challenge Task
The Robust Algorithms for Thorough Essay Rating (RATER) challenge focused on creating more efficient versions of the algorithms developed in the original Feedback Prize competition series. The task was to develop an efficient “super-algorithm” to:
- Segment an essay into individual argumentative elements and classify each element according to its argumentative type (similar to the “Feedback Prize - Evaluate Student Writing” Kaggle competition task), and
- Evaluate the effectiveness of each discourse element (similar to the “Feedback Prize - Predicting Effective Arguments” Kaggle competition task)
Teams worked with pre-existing models developed from the Feedback Prize competitions or built entirely new solutions from scratch.
Challenge Goal
All automated writing feedback tools have limitations. Many often fail to identify the structural elements of argumentative writing, such as thesis statements or supporting claims, or they fail to evaluate the quality of these argumentative elements. Additionally, most available tools are proprietary or inaccessible to educators because of their cost. This problem is compounded for under-serviced schools which serve a disproportionate number of students of color and from low-income backgrounds. In short, the field of automated writing feedback is ripe for innovation that could help democratize education.
This competition sought to create an algorithm to predict effective arguments and evaluate student writing overall. With this algorithm, students will have higher quality and more accessible automated writing tools.


Data Overview
The PERSUADE (Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements) dataset was used as the training dataset for the competition. This dataset consists of over 25,000 argumentative essays annotated for discourse elements.
Results
Each submission included predictions for its discourse type label and discourse effectiveness label. Submissions were scored using a custom evaluation metric. You can read more about the evaluation metric here.
Rank | Team | Highest Score |
---|---|---|
1 | Kkiller (k.neroma) | 0.6622 |
2 | Døge Learner (h.abedi) | 0.6474 |
3 | Catalpa (andrea1) | 0.6253 |