Productive Struggle: The Future of Human Learning in the Age of AI

February 24, 2025

Megha Srivastava and Rose Wang

Walking through our computer science building, we can see ChatGPT on nearly every screen. Today, students can use AI at every stage of their learning process. For example, instead of struggling to figure out how to start a coding assignment, students can simply copy and paste the question into an AI model. Even if the solution doesn’t work perfectly out of the box, they can re-prompt the model with its own solution and an error description to receive a fixed solution.

We can’t help but compare this to our own experiences learning to program during undergrad. We remember the struggle of writing our first lines of code, the days spent debugging with friends at the student center, and the feeling of success after a night’s sleep when finally fixing the bug. We didn’t enjoy being stuck in the moment; but now, we look back and understand that going through these surmountable struggles was important for our learning. Our productive struggles not only helped us provide the correct solution in the short run, but also how to write stronger, less error-prone code in the long run.

AI systems like ChatGPT are undeniably exciting, but they also challenge the very essence of how we, humans, learn. These systems excel at tasks that requires years of training to master, such as competitive mathematics or college-level programming. They are also becoming more accessible, ready to be used whenever a task poses the slightest bit of difficulty for us. Despite progress in AI, skills such as literacy in both children and adults are declining, raising the question: what aspects of learning do we want technology to cultivate?

We need to struggle in order to develop new skills. By “struggle”, we mean the effort students put into understanding a concept and working through challenges to uncover solutions that are not immediately obvious. While we may not enjoy the struggles of learning, struggle teaches persistence and deepens understanding. Our worry is that with AI, we may develop a habit of avoiding struggle, and that habit risks eroding the depth of our knowledge.

How do we preserve meaningful learning in a world where answers are just a prompt away?

While we may not enjoy the struggles of learning, struggle teaches persistence and deepens understanding. Our worry is that with AI, we may develop a habit of avoiding struggle, and that habit risks eroding the depth of our knowledge.

The Evaluation Paradox

Our traditional paradigms for evaluating AI systems often rely on user satisfaction ratings or benchmark assessments – metrics that research has shown to be insufficient for education. For example, Hiroko Warshauer and James Hiebert show that effective support for struggle in pedagogical settings requires attention to multiple dimensions, such as the nature of a teacher’s language, the design of tasks, and the broader learning environment. Furthermore, studies by Arthur Glenberg and Michael Pressley et al., have shown that students often overestimate their own understanding and may prefer systems that reduce struggle in the short term. How is this paradox between user preference and their learning reflected in our current AI systems?

In our work on evaluating interactions between humans and AI systems – in this case, language models – for information-seeking tasks like question answering, we also observed a disconnect between users’ views of helpfulness and their task performance: the language models which users self-reported as helpful were not always the ones that led to higher task accuracy! This result was attributed to users putting misplaced trust on to the “confident” and “definitive” language generated by certain language models, particularly those with additional fine-tuning. Only if these users encountered a confident-sounding answer that was obviously wrong, did their assessment of the language model rapidly decline. On the other hand, a few users working with language models that provided in-direct and lengthier answers trusted that their struggle was intentional and held value, including one participant who stated “the task may not be as fun if the AI would give you all the answers!”

We asked users to answer College Chemistry questions from the MMLU dataset while given access to different language models (LMs) for help. Those interacting with stronger LMs (e.g. instruction-tuned models) received more direct and confident responses than those interacting with weaker LMs, even in the presence of hallucinations. This resulted in a discrepancy between user helpfulness and task performance, suggesting over-reliance that can hinder learning.

Another domain where this evaluation paradox occurs is rehabilitation technology, where it is crucial for the patient to trust that any robot-assisted therapy (e.g. repetitive movements) will actually lead to long-term improvement. In our work on AI-assisted motor learning, we again saw that users self-reported decreased preference for the type of AI-assistance that actually led to improved learning. Our experiment asked participants to learn to control a vehicle in a simulated environment, and we found that our AI-based instruction encouraged participants to learn a new skill of successfully operating the vehicle in reverse. However, participants found reversing uncomfortable and frustrating. In this setting, AI succeeded at helping students learn a new skill, leading to overall task improvement, but failed to inspire student resilience.

Participants that received personalized AI training in our parking simulator were encouraged to try to learn to reverse – a challenging skill that, when learned, leads to task improvement (e.g. taking less time to park). While participants found personalized AI training less helpful (left) than the control training curricula, it led to higher task performance (middle) and usage of this skill in evaluation trials (right), showing how user preference is a poor proxy for learning gain.

If student self-reported assessments and engagement isn’t necessarily reflective of learning, what then does good teaching look like when students are struggling? Can we help teachers guide their students through productive struggle with AI?

Fostering Productive Struggle by Empowering Teachers

While many people can be a teacher—from parents, mentors, tutors to traditional classroom teachers—good teachers are hard to come by. Good teachers gain their expertise through years of training or trial and error, and students who most need experienced teachers often have the least access to them. This inequity impacts the quality of their education and the nature of how students struggle.

One exciting direction is using AI to help human educators create better moments of productive struggle for their students. In earlier research, we observed that novice educators have difficulties helping struggling students, particularly under time pressure. These educators were not sure how to nudge students and come up with the right thing to say on the spot. Without guidance on how to effectively foster productive struggle, they frequently defaulted to providing the solution to the student. This meant missed opportunities to turn a student’s struggle into meaningful learning!

While many people can be a teacher—from parents, mentors, tutors to traditional classroom teachers—good teachers are hard to come by. Good teachers gain their expertise through years of training or trial and error, and students who most need experienced teachers often have the least access to them. This inequity impacts the quality of their education and the nature of how students struggle.

To address this, we developed Tutor CoPilot, an AI-powered system designed to provide live suggestions to human tutors on how to foster productive struggle. Tutor CoPilot is a language model that generates expert-like suggestions on scaffolding the student’s learning, such as asking a guiding question or providing a hint. Unlike generic tools like ChatGPT that risk providing the answer to students or may not be able to engage a student for an entire hour of learning, Tutor CoPilot focuses on amplifying the tutor’s ability to foster productive struggle.

We tested Tutor CoPilot to provide tutors guidance in a large randomized controlled trial and found that the technology improved student performance on math test. We found that students working with tutors that had access to Tutor CoPilot were 4 percentage points more likely to pass their math lesson tests.

But, what actually changed in the tutor’s instruction to enable students to learn better? Were the tutors actually fostering productive struggle?

When we looked at all the tutors’ language, we found that tutors who had access to Tutor CoPilot were indeed using language that better scaffolded learning and fostered productive struggle, such as prompting students to explain their answers. Tutors who didn’t have access to Tutor CoPilot gave away the answer and solution strategy. By improving how tutors foster productive struggle, students were learning better as a result!

Tutors with access to Tutor CoPilot used strategies that better fostered productive struggle, whereas tutors who didn’t have access gave away answers and gave generic encouragement to the students.

There are other exciting approaches that invite us to reimagine AI’s role in engaging real educators and students in productive struggle. For example, work from CU Boulder explores how AI can help students collaborate better with their peers by establishing community agreements. Work from Amplify leverages technology to make classroom learning more social by enabling students to try different ideas, share their math observations and have more meaningful classroom discussions. These examples illustrate how we can support learning by inviting student curiosity and empowering human relationships between students and educators via technology.

Conclusion

At its heart, learning is so much more than about just finding the right answer: it’s about building resilience, fostering curiosity, and enriching the journey of discovery for students. Despite incredible advances in technology such as AI, human skills – from literacy to fine motor skills – are continuing to decline, with some blaming increased screen time and technology reliance. In an era where AI can deliver instant solutions and gratification, we must reconsider how to actively preserve the essential aspects of learning.

We believe AI’s role in education isn’t to eliminate struggle but to enhance it. Whether it’s by empowering educators with tools like Tutor CoPilot, or empowering students with inquiry-driven environments, we wish to ensure that AI supports deeper, more meaningful learning experiences. Let’s build a future where AI systems encourage—not shortcut—meaningful learning from the get-go.

This article first appeared as a post on The Stanford AI Lab Blog.