Co-Designing LLM Tutors for Student Success: Five Key Takeaways and a Roadmap for Developers

April 7, 2025

Kent Fischer

Our research at Georgia State University, a Minority Serving Institution, involved students testing multiple LLM tutors to determine what works best. From this study, we identified five essential recommendations for optimizing these AI-based tutoring systems.

First, students benefit from having multiple hint styles. Some prefer a Socratic, question-based approach that encourages deep thinking, while others want direct and concise feedback. Providing a toggle between these options ensures that learners can engage in a way that suits their needs.

Second, debugging approaches should be tailored to the learner’s experience level. Novices require step-by-step guidance to develop their problem-solving skills, while advanced coders prefer pinpointed error identification to streamline their workflow. Adaptive debugging mechanisms that adjust based on user behavior can significantly enhance the effectiveness of LLM tutors.

Third, balancing reflection and rapid-fire feedback is essential for quiz and practice sessions. Some students benefit from open-ended exploration, while others prefer immediate yes/no checks to confirm their understanding quickly. Providing both modes allows students to switch between exploratory learning and efficiency-driven review.

Fourth, encouraging independence is key to effective learning. Instead of providing full solutions immediately, tutors should use incremental hint reveals. This ensures that students attempt to solve problems independently before receiving direct answers, reinforcing their problem-solving skills while preventing frustration from prolonged struggles.

Finally, seamless UI/UX integration is crucial for maintaining student engagement. LLM tutors should function within in-app code editing environments or integrate with popular IDEs to minimize disruptions caused by constant toggling between tools. Even minor interface improvements, such as a “paste snippet” button, can significantly enhance usability and efficiency.

These key takeaways are essential for any developer looking to refine LLM-based tutoring solutions. Let’s delve into the details of our research and discuss how to integrate these insights into effective AI-driven education tools.

Why We Studied LLM Tutors And Student Success

Large Language Model (LLM) tutors, such as Khan Academy’s Khanmigo and Harvard’s CS50.ai, offer scalable and personalized learning opportunities. However, their effectiveness for students, particularly those attending Minority Serving Institutions (MSIs), remains a crucial question. Many of these students juggle academic responsibilities alongside work and family commitments, making the need for efficient and adaptable tutoring even more significant.

Students benefit from having multiple hint styles. Some prefer a Socratic, question-based approach that encourages deep thinking, while others want direct and concise feedback. Providing a toggle between these options ensures that learners can engage in a way that suits their needs.

To explore these challenges, our research team at Georgia State University, in collaboration with MIT and funded by Axim Collaborative, conducted a study within an introductory Python programming course. Students used LLM tutors for various tasks, including concept comprehension, debugging, quiz preparation, and program development. Through focus groups, chat log analysis, and direct student feedback, we identified both strengths and areas needing improvement. Here are our five key takeaways, along with a roadmap for implementation.

Five Key Takeaways for Practitioners

Students Want Options in How to Learn and Interact

Student preferences for tutor interaction styles varied widely. Some learners thrived on Socratic, question-based prompts that encouraged deeper thinking, while others found such interactions frustrating and preferred direct explanations. Additionally, some students appreciated empathetic messages like “Great job!” while others considered them distracting. The variation extended to tone as well—some learners favored a conversational style, while others preferred succinct bullet points.

Observations from our focus groups highlighted these preferences in practice. Khanmigo frequently employed open-ended questions such as “Why might it not be printing anything?” This approach encouraged reflection but sometimes felt too slow for time-pressed learners. In contrast, CS50.ai provided direct answers, such as pinpointing errors at specific lines of code, which advanced coders appreciated but could overwhelm beginners.

The implication for LLM tutor design is clear: offering a hint-style toggle that allows students to switch between reflective and direct feedback can improve engagement. Additionally, providing an opt-in setting for motivational messages ensures that students who benefit from encouragement can enable it, while others remain focused without distraction.

The implication for LLM tutor design is clear: offering a hint-style toggle that allows students to switch between reflective and direct feedback can improve engagement.

Debugging Needs Vary with Experience

Students at different skill levels require different debugging strategies. Advanced coders favor direct error identification and quick fixes, while novices benefit from iterative, step-by-step approaches that guide them through the debugging process.

Our focus groups revealed that confident programmers appreciated CS50.ai’s immediate issue identification, but beginners found it overwhelming or overly technical. Conversely, Khanmigo’s iterative feedback approach was helpful for novices but occasionally felt too broad or inefficient for experienced users. Some students also struggled when tutors introduced concepts not yet covered in class.

The solution is to develop adaptive debugging logic. If a learner repeatedly struggles with a problem, the tutor should shift to slower, more detailed hints. For advanced users who demonstrate proficiency, feedback should remain concise and direct. This dynamic approach allows LLM tutors to cater to students’ individual learning curves effectively.

If a learner repeatedly struggles with a problem, the tutor should shift to slower, more detailed hints. For advanced users who demonstrate proficiency, feedback should remain concise and direct. This dynamic approach allows LLM tutors to cater to students' individual learning curves effectively.

Balancing Reflection and Rapid-Fire in Quiz Prep

Different learners have different needs when it comes to quiz preparation. Some benefit from deeper, reflective discussions, while others require quick yes/no checks or brief hints when reviewing under time constraints.

Khanmigo excels at tutor-guided prompting, which supports conceptual exploration but may be too open-ended for rapid quiz review. CS50.ai, on the other hand, provides concise responses, making it ideal for advanced students cramming for a test but potentially leaving novices feeling unsupported.

To address these differences, LLM tutors should offer two quiz-prep modes. One mode can focus on deeper mastery through Socratic questioning, while the other enables rapid checks with immediate feedback. This flexibility ensures that students can engage with content in a manner that aligns with their specific study needs.

LLM tutors should offer two quiz-prep modes. One mode can focus on deeper mastery through Socratic questioning, while the other enables rapid checks with immediate feedback. This flexibility ensures that students can engage with content in a manner that aligns with their specific study needs.

Encouraging Independence in Program Development

Effective learning requires balancing guidance with independent problem-solving. Providing full solutions too soon can hinder student learning, while insufficient support can leave beginners stuck and frustrated.

In our study, Khanmigo broke tasks into smaller subproblems, which was beneficial for novices but restrictive for advanced users. Conversely, CS50.ai provided direct solutions, which helped experienced coders but often left beginners struggling to understand the underlying concepts.

To optimize learning outcomes, LLM tutors should use incremental hinting. This means offering subtle clues first, followed by partial code snippets, and revealing full solutions only if the student remains stuck after multiple attempts. Adding reflection prompts like “Why does this snippet fix the problem?” reinforces comprehension and encourages deeper learning.

To optimize learning outcomes, LLM tutors should use incremental hinting. This means offering subtle clues first, followed by partial code snippets, and revealing full solutions only if the student remains stuck after multiple attempts.

Ensuring a Seamless Flow Through UI/UX Integration

Students expressed frustration with having to switch between multiple windows and copy/paste frequently. A seamless, frictionless user interface, ideally integrated with in-app code editing or directly linked to popular IDEs, significantly improves the tutoring experience.

Our research revealed that many participants found the need to toggle between the LLM tutor’s chat window and a separate IDE cumbersome. Additionally, the lack of simple code-upload options slowed debugging, and students expressed a desire for code visualization or error highlighting within the tutor environment.

To enhance usability, developers should focus on streamlining the UI/UX. Features like a built-in code editor, direct integration with IDEs like VS Code, and simple “paste snippet” buttons can greatly improve efficiency and reduce distractions.

Students expressed frustration with having to switch between multiple windows and copy/paste frequently. A seamless, frictionless user interface, ideally integrated with in-app code editing or directly linked to popular IDEs, significantly improves the tutoring experience.

Conclusion

LLM tutors have the potential to transform education, particularly for students with diverse backgrounds and learning needs. Our study highlights five critical areas for improvement: offering flexible interaction styles, implementing adaptive debugging, balancing reflection with rapid quiz prep, using incremental hinting, and optimizing UI/UX for seamless integration.

By co-designing with actual users, developers can create LLM tutors that adapt to different learning preferences and skill levels. Rolling out features in phases, gathering real-world feedback, and refining tools based on user needs will lead to more effective and personalized educational experiences. Ultimately, these improvements empower students, making learning more engaging, accessible, and successful.

Note: For details of the study, see the following publication: Rai, Arun; Chen, Liwei; Breazeal, Cynthia; Ramesh, Balasubramaniam; Long, Yuan; and Aria, Andrea, “Design and Evaluation Attributes for Scalable, Cost-Effective Personalization of LLM Tutors in Programming Education” (2024). International Conference on Information Systems Proceedings. https://aisel.aisnet.org/icis2024/learnandiscurricula/learnandiscurricula/9

This article was written by Arun Rai, Liwei Chen, Cynthia Breazeal, Balasubramaniam Ramesh, Yuan Long, and Andrea Aria