Predictions about AI’s role in education have been widespread – ranging from AI tutors to entirely automated classrooms.
While many of these predictions are premature, overhyped, or improbable, one forecast is already beginning to reshape education: machine learning tools are revolutionizing assessments in ways big and small.
While many of these predictions are premature, overhyped, or improbable, one forecast is already beginning to reshape education: machine learning tools are revolutionizing assessments in ways big and small.
Consider that over the past year alone, over 4,500 articles have been published that mention Large Language Models (LLMs) and education assessments. That’s more than the number of papers published on LLMs and many other topics, like chemistry or astronomy, over the same period.
Part of the strength of using LLMs in assessment is that education assessments are predictive tasks. The data-rooted, predictive nature of educational assessments is why assessment experts require interpretative and statistics skills. AI is also very good at prediction tasks, which is why it will be game-changing for the field of assessments.
LLMs learn patterns in vast amounts of text, allowing them to generate coherent responses and solutions to prompts. For example, in generating math multiple-choice questions, new LLM tools like HEDGE train on human-created questions and answers to create the questions and explain correct answers. As explained in a new research paper, this streamlines the creation of high-quality assessments, reduces educator workload, and can enhance the consistency and fairness of evaluations, ultimately providing more time for higher-level instruction.
Another new paper looks at techniques such as retrieval-augmented generation, which incorporates vetted external knowledge into a model, and how this approach can enhance the effectiveness of tutoring. Again, the paper argues that an assessment of tutoring skills can improve the support provided to student learning. Put differently, by leveraging this technique, AI can raise the quality of both instruction and learning outcomes.
By leveraging [retrieval-augmented generation], AI can raise the quality of both instruction and learning outcomes.
Another approach that’s come up a lot in research on assessments in education is reinforcement learning, which can be used to generate high-quality feedback on students’ incorrect answers to math multiple-choice questions through a feedback generation framework that optimizes for correctness and alignment based on a math feedback evaluation rubric. Automated feedback is beneficial for students as they receive personalized, immediate direction and for educators as this decreases their time and work on routine tasks.
The use of AI feedback has also been evaluated from the educator’s perspective and in a real-world context. A recent study on Project Topeka, an AI-powered tool that provides instructional materials and immediate AI-generated feedback, found the tool can aid in improving student writing by enabling teachers to focus more on student engagement and in-depth instruction.
With automatic in-depth feedback, students worked independently on their writing longer, and since the tool saved teachers’ time by reading and providing feedback, teachers had more time to conference with their students on understanding the feedback and strategizing how to revise their papers, particularly those who struggled with their writing. By offloading routine feedback tasks, educators can dedicate more time to meaningful interactions with students, ultimately enhancing the quality of instruction and support for individual learners.
By offloading routine feedback tasks, educators can dedicate more time to meaningful interactions with students, ultimately enhancing the quality of instruction and support for individual learners.
However, LLMs have their limitations and can face significant challenges such as “hallucinations” (i.e., producing inaccurate information). Particularly in math, LLMs have difficulty grasping underlying concepts and reasoning steps as they rely on patterns and statistics, mimicking memorization rather than understanding. Addressing these challenges requires careful curating of training data, a large training dataset, and ongoing improvements to model architecture. LLMs can advance education but developers must ensure appropriate representation and precision in their training and use.
No doubt, moving forward with LLMs in education assessments will require careful consideration and risk evaluation, such as the challenge of ensuring LLMs properly handle complex concepts and training data is large and representative given that they simply replicate patterns. Developers must be sure models are well-researched and carefully implemented, and that emerging technical tools are balanced with human expertise.
But when it comes to AI and assessment, AI will dramatically shift the way in which students are tested.