The advancement of Large Language Models (LLMs) and AI chatbots like ChatGPT has generated significant interest, making headlines around the world. Some of the news is overhyped like LLMs leading to the demise of world civilization. Alternatively, other headlines warn that LLMs might not be powerful enough.
Whatever the case, advanced technologies like LLMs offer great potential in improving educational outcomes, and more can be done to leverage these technical advancements for the field. So our team decided to customize ChatGPT around the Doing What Works (DWW) Library, which helps educators more easily access research on successful classroom practices. We created a prototype of a chatbot that gives evidence-based education advice to teachers and administrators as requested. We aimed to show how LLMs can help scale best practices in teaching and learning.
We followed the work of AI expert Arjun Sha, and overall, the ChatBot was surprisingly successful. Indeed, in our qualitative analysis, we found the DWW-trained ChatGPT outperformed the baseline ChatGPT on various questions regarding educational practices, programs, and policies, which we discuss below.
Try the tool here.
An image of the AI Chatbot for the Doing What Works Library
Background
The work was inspired by the recent innovation of LangChain, a framework for developing tools around LLMs. LangChain has expanded the possibilities of creating customized educational applications and chatbots. LangChain, in particular, enables the fast and easy training of a chatbot like ChatGPT on custom data. In this sense, LangChain can make ChatGPT very easily tailored to specific use cases.
The team focused on the DWW Library, which is based on the U.S. Department of Education’s What Works Clearinghouse, a federal initiative that aims to bridge the gap between research and practice and answer the question “What works in education?” The DWW library serves as an informational resource for educators, providing interviews, sample curriculum materials from schools and classrooms, and other tools to guide evidence-based decision-making. The library’s content ranges from adolescent literacy to school turnaround and dropout prevention.
Put simply, the DWW library helps educators translate research findings into effective practices, which is critical in a field that often lacks or underutilizes research and data. Numerous education studies are published each year, but not all of them are meaningful enough to translate into positive outcomes. Even the reliable ones are not easily accessible to school leaders or teachers. The DWW library provides easy-to-read materials for educators to learn how to identify research-backed practices and improve learning outcomes.
For example, if school leaders are interested in devising a schoolwide dropout prevention strategy, the DWW library can assist them in assessing whether there is enough evidence to support the proposed academic or social interventions. The library can provide bite-sized diagrams on the most effective practices to keep students on the path to graduation, interviews with leading experts on dropout prevention practices, or comprehensive guides on the research underlying these recommendations. These informational tools, available in various media formats such as videos, planning templates, and documents, can drive education forward by providing valuable information on best practices.
Methods
For this experiment, the team trained an AI chatbot on DWW’s audio transcripts, using an integration of the ChatGPT API, LangChain, and GPTIndex (a framework for connecting to external data sources).
The team wanted to see if the DWW-trained chatbot could outperform the base ChatGPT in answering prompts ranging from simple data retrieval to more complex tasks like:
- Synthesize opinions from multiple source materials.
- Compare opinions coming from different source materials.
- Make judgments and provide rationale.
- Make suggestions for specific cases.
- Redirect users to additional related reading materials.
- Make recommendations.
- Deal with misspellings.
- Deal with unclear instructions.
The team compared output responses between the customized ChatGPT and the base model for prompts like:
- Tell me about Chicago Public Schools’ dropout recovery program.
- Multiple people have different ideas on how to make Rtl (Response to Intervention) work. Give me a summary of their opinions and what you personally think would be the best approach.
- Compare what you’ve learned about RtI from the source materials to what you already knew.
For the work, the team developed a robust and custom AI chatbot on the DWW database accessible via a public URL. However, additional work is necessary for further web interface customization, experimentation with different configurations and LLMs, or preprocessing of data formats (e.g., images) to ensure suitability for chatbot training.
Results
While the DWW model did not outperform the base model in answering general knowledge questions, it excelled in addressing specific inquiries related to the custom knowledge base. In preliminary testing, the DWW model demonstrated the ability to effectively synthesize and compare opinions from various source materials.
For instance, the DWW-trained chatbot could give a better answer than the baseline ChatGPT when given the prompt, “Which would be the more effective teaching method when teaching an intermediate English as a Foreign Language learner: efferent text discussion, or afferent text discussion? Provide reasons for your choice.”
- DWW-trained ChatGPT :
“The more effective teaching method when teaching an intermediate English as a Foreign Language learner would be efferent text discussion. This is because the research found that efferent discussion, which is a discussion about what the text actually says, was the most successful method for improving comprehension and critical thinking. This is especially true for average and below-average students, who may not be able to understand the text without discussing it. Afferent text discussion, which is a discussion about how the students felt about the text or how they could use it in their daily lives, is not as effective for improving comprehension and critical thinking.”
- Regular ChatGPT :
“Both efferent and afferent text discussions can be effective teaching methods for intermediate English as a Foreign Language learners, but the choice between the two depends on the specific learning goals and needs of the students. Efferent text discussion focuses on extracting information from the text and understanding its main ideas and details. This method is useful for developing reading comprehension skills and building vocabulary. It can also help students develop critical thinking skills as they analyze and evaluate the text. Afferent text discussion, on the other hand, focuses on the emotional and personal responses to the text. This method is useful for developing language fluency and building confidence in expressing opinions and ideas. It can also help students develop empathy and cultural awareness as they explore different perspectives and experiences. If the goal is to improve reading comprehension and vocabulary, efferent text discussion may be more effective. If the goal is to improve language fluency and encourage personal expression, afferent text discussion may be more effective. However,a combination of both methods can provide a well-rounded approach to teaching English as a Foreign Language.”
The DWW-trained chatbot also performed better when faced with prompts containing spelling errors compared to the baseline model. However, it’s important to note that the DWW model is not free from hallucination ,and there are instances where it struggles to differentiate between its base knowledge (what it was pre-trained on) and the source material (the DWW library). Additionally, clear instructions are necessary for the DWW model to provide desired answers, making it more suitable for an audience familiar with prompt formatting. Nonetheless, this chatbot and similar tools can serve numerous useful applications for teachers and school leaders. For example, teachers can utilize the chatbot to support their individual lesson planning and design. They can also ask the chatbot simple questions regarding evidence-based strategies for effective classroom management or creating personalized learning environments The chatbot can also leverage its knowledge base to assess whether district policies align with the research evidence. School district leaders, for instance, can rely on the chatbot for assistance in evaluating current policy interventions and action plans, such as their school turnaround initiatives Overall, these advancements will encourage more real-world applications of AI within the field of education. They will also ensure these tools are closely linked to high-quality research and data, ultimately improving educational outcomes.
– Joon Suh Choi, Perpetual Baffour, Scott Crossley, Ulrich Boser
5 thoughts on “Creating a Custom Chatbot To Give Evidence-Based Education Advice”
Question: What is the best way to adapt fourth grade math to the three major learning styles students might have?
Answer: The best way to adapt fourth grade math to the three major learning styles students might have is to provide a variety of teaching methods that cater to each learning style. For example, visual learners can benefit from diagrams, charts, and other visuals that help them understand the concepts. Auditory learners can benefit from lectures, discussions, and other activities that involve verbal communication. Kinesthetic learners can benefit from hands-on activities, such as using manipulatives or playing math games. Additionally, providing students with opportunities to work in groups or individually can help them to better understand the material.
[TL;DR: This seems like a major misuse of LLMs.]
Question: Does rigorous research support the idea that students have different learning styles?
DWW Answer: Yes, rigorous research does support the idea that students have different learning styles. Studies have shown that students have different preferences for how they learn, and that different strategies can be more effective for different students.
I’d be curious what the model said prior to training. Both ChatGPT (GPT-4) and Perplexity (not even the Pro version) had a totally different take.
Perplexity, in part: There is no rigorous research to support the idea that students have different learning styles. In fact, a comprehensive analysis of recent research on learning styles found scant evidence to clearly support the idea that outcomes are best when instructional techniques align with individuals’ learning styles. Although many people, including educators, believe learning styles are set at birth and predict both academic and career success, there is no scientific evidence to support this common myth. Furthermore, nearly all of the studies that purport to provide evidence for learning styles fail to satisfy key criteria for scientific validity. Therefore, it is important to recognize that the use of learning styles in education may be detrimental and educators may be wasting time and money on products, services, and teaching methods that are geared toward learning styles.
(Usual caveats [e.g., mention of trade names doesn’t imply endorsement] apply.)
@Matt, this learning styles stuff is fascinating and I’ve love to hear the authors’ response!
I tried:
Question: Does rigorous research support the class size reduction as a strategy for improving academic outcomes?
Response: Rigorous research does not necessarily support class size reduction as a strategy for improving academic outcomes. While some studies have suggested that reducing class size can have a positive effect on student outcomes, other studies have found that the effects of class size reduction are minimal or nonexistent. Additionally, reducing class size can be costly and may not be the most effective use of resources.
So…better than the response for learning styles:)
Thanks for flagging this! It’s possible there wasn’t enough information on learning styles in the DWW library to “debunk” the myths the chatbot may have learned in its general knowledge base. I think it points to the need for larger databases covering various educational topics to customize these chatbots. This prototype is just one step in that direction.
Thanks for flagging this issue. We agree that generative AI tools and LLMs are imperfect, and more can be done to mitigate their outputting of false or inaccurate information. More quality data is needed to train these tools, and this prototype chatbot is just one step in that direction.