Skip to content
The Learning Agency
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Services
    • Case Studies
    • Competitions
      • RATER Competition
    • Reports & Resources
    • Newsroom
  • The Cutting Ed
  • Learning Engineering Hub
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Services
    • Case Studies
    • Competitions
      • RATER Competition
    • Reports & Resources
    • Newsroom
  • The Cutting Ed
  • Learning Engineering Hub
  • Overview
  • Overview
TLA_sis-concern_logo
The Learning Agency
TLA_sis-concern_logo
  • Overview
  • Overview
The Learning Agency
TLA_sis-concern_logo
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Services
    • Case Studies
    • Competitions
      • RATER Competition
    • Reports & Resources
    • Newsroom
  • The Cutting Ed
  • Learning Engineering Hub
  • Home
  • About
    • About Us
    • Our Team
    • Our Openings
  • Our Work
    • Services
    • Case Studies
    • Competitions
      • RATER Competition
    • Reports & Resources
    • Newsroom
  • The Cutting Ed
  • Learning Engineering Hub

Competition Data

Dataset Description

The dataset presented here comprises discourse element types and effectiveness for over 25,000 argumentative essays written by U.S. middle and high school students on 15 prompts. The training dataset will be the PERSUADE (Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements) Corpus.

Each essay has been broken down into its discourse elements and labeled by type:

  • Lead - the introduction begins with a statistic, a question, a description, or some other device to grab the reader’s attention and point toward the thesis
  • Position - an opinion or conclusion on the main question
  • Claim - a claim that supports the position
  • Counterclaim - a claim that refutes another claim or gives an opposing reason to the position
  • Rebuttal - a claim that refutes a counterclaim
  • Evidence - ideas or examples that support claims, counterclaims, or rebuttals
  • Concluding Statement - a concluding statement that restates the claim

The discourse element effectiveness is also included for each element type:

  • Effective - the discourse element type is well presented and executed
  • Non-Effective - the discourse element type is not presented or executed to the fullest extent

The annotation scheme for discourse element labels and discourse element effectiveness labels can be found here.
Note that there were previously three possible effectiveness labels: 1. Ineffective, 2. Adequate, and 3. Effective. The effectiveness label is now binary: 1. Non-Effective or 2. Effective.

A model should be built that can:

  • Segment an essay into meaningful, coherent units (i.e., discourse elements)
  • Predict the discourse element type label
  • Predict the discourse effectiveness label

File and Field Information

train.csv – the training set comprised of the discourse element type and effectiveness for each essay, identified by a unique essay_id_comp

  • essay_id_comp - unique essay identifier
  • full_text - full text of essay
  • discourse_id - discourse element identifier
  • discourse_start - left bound of discourse element segment, denoted the starting character position
  • discourse_end - right bound of discourse element segment, denotes the ending character position
  • discourse_type - class label identifying type of discourse element
  • predictionstring - a sequence of token indices corresponding to the discourse segment
  • discourse_text - literal text from the essay of the discourse element segment
  • discourse_effectiveness - quality rating of the discourse element segment
  • discourse_type - enumerated class label of the discourse element type
  • hierarchical_id - unique identifier of the hierarchical element
  • hierarchical_text - literal text from the essay of the hierarchical element
  • hierarchical_label - class label of the hierarchical element
  • holistic_essay_score - rating of the essay quality
  • source_text - title of accompanying source text(s)
  • essay_id_comp - unique essay identifier
  • discourse_id - discourse element identifier
  • discourse_start - left bound of discourse element segment denoting the starting character position
  • discourse_end - right bound of discourse element segment denoting the ending character position
  • discourse_type - class label of discourse element
  • predictionstring - a sequence of token indices corresponding to the segmented discourse
  • discourse_text - literal text of discourse element
  • discourse_effectiveness - quality rating of discourse element
  • discourse_type_num - enumerated class label of discourse element
  • source_text - title of accompanying source text(s)

test.csv – the test set used to generate predictions to put in your submission file to submit to the leaderboard during the competition

  • essay_id_comp - unique essay identifier
  • full_text - full text of essay

sample_submission.csv – an example submission file in the correct format. See the Submission File section below for details.

  • id - essay identifier
  • predictionstring - a sequence of token indices corresponding to the segmented discourse
  • score_discourse_effectiveness_0 - a predicted probability of the segmented discourse’s effectiveness rating of Non-Effective
  • score_discourse_effectiveness_1 - a predicted probability of the segmented discourse’s effectiveness rating of Effective
  • discourse_type - the enumerated predicted class label of the segmented discourse’s rhetorical or argumentative type
    0 - Lead
    1 - Position
    2 - Claim
    3 - Evidence
    4 - Counterclaim
    5 - Rebuttal
    6 - Concluding Statement

Evaluation Metric

Submissions will be scored using a composite measure of accuracy, efficiency, and fairness across student race/ethnicity, English Language Learner (ELL) status, and economic categories in a single score. You can read more about the evaluation metric here.

Submission File

For each essay_id_comp in the test set, you must segment the essay into meaningful, coherent units (i.e., discourse elements), predict the discourse type label of the discourse element segment, and predict the effectiveness label of the discourse element segment. The file should contain a header and have the following format:

essay_id_comp predictionstring score_discourse_effectiveness_0 score_discourse_effectiveness_1 discourse_type
215B5CA132E4 3 4 5 6 7 8 9 10 11 0.483 0.517 0
215B5CA132E4 12 13 14 15 16 17 18 19 20 21 22 0.217 0.783 1
215B5CA132E4 23 24 25 26 27 28 29 30 31 32 33 34 35 36 0.521 0.479 3
215B5CA132E4 37 38 39 40 41 42 43 44 45 0.359 0.641 2
Submissions to the leaderboard will be evaluated as they are submitted. The leaderboard will be updated following each submission. At the close of the competition, all teams will submit one model for final evaluation.
The model submitted for final evaluation should include, but is not limited to, a jupyter notebook (.ipynb) containing all data and code used to train the model, allowing for replication of training model and findings.
For more information on final model submission requirements, please see Section B of the Rules page.

Contact Us

General Inquiries

info@the-learning-agency.com

Media Inquiries

press@the-learning-agency.com

Facebook Twitter Linkedin Youtube

Mailing address

The Learning Agency

700 12th St N.W

Suite 700 PMB 93369

Washington, DC 20002

Stay up-to-date by signing up for our weekly newsletter

© Copyright 2025. The Learning Agency. All Rights Reserved | Privacy Policy

Stay up-to-date by signing up for our weekly newsletter